如何过滤字符串串列？-编程知识-白鹭情

我有一个包含非英语/英语单词的字符串串列。我只想过滤掉英文单词。

例子：


phrases = [
    "S/O ???? ?????, ????? ?.-4??, S/O Ashok Kumar, Block no.-4D.",
    "???????-15, ????? 5. ????? ????? Street-15, sector -5, Civic Centre",
    "?????, ?????, ?????, ?????????, Bhilai, Durg. Bhilai, Chhattisgarh,",
]

到目前为止我的代码：

import re
regex = re.compile("[^a-zA-Z0-9!@#$&()\\-`. ,/\"] ")
for i in phrases:
    print(regex.sub(' ', i))

我的输出：

["S/O , .-4 , S/O Ashok Kumar, Block no.-4D.",
  "-15, 5. Street-15, sector -5, Civic Centre",
  ", , , , Bhilai, Durg. Bhilai, Chhattisgarh",]

我的愿望输出

["S/O Ashok Kumar, Block no.-4D.",
 "Street-15, sector -5, Civic Centre",
 "Bhilai, Durg. Bhilai, Chhattisgarh,"]

uj5u.com热心网友回复：

如果我查看您的资料，您似乎可以使用以下内容：

import regex as re
lst=["S/O ???? ?????, ????? ?.-4??, S/O Ashok Kumar, Block no.-4D.",
      "???????-15, ????? 5. ????? ????? Street-15, sector -5, Civic Centre",
      "?????, ?????, ?????, ?????????, Bhilai, Durg. Bhilai, Chhattisgarh,",]
for i in lst:
    print(re.sub(r'^.*\p{Devanagari}. ?\b', '', i))

印刷：

S/O Ashok Kumar, Block no.-4D.
Street-15, sector -5, Civic Centre
Bhilai, Durg. Bhilai, Chhattisgarh,

查看在线正则表达式演示

^ - 开始字符串锚；
.*\p{Devanagari} - 0 （贪婪）字符直到最后一个梵文字母；
. ?\b - 1 （懒惰）字符直到第一个字边界

uj5u.com热心网友回复：

如果您的意思是您的字符可能只是标准英文字母，而您的正则表达式适用于此，而您只想过滤掉有问题的“, , , ,”值，您可以执行以下操作：

def format_output(current_output):
    results = []
    for row in current_output:
        # split on the ","
        sub_elements = row.split(",").
        # this will leave the empty ones as "" in the list which can be filtered
        filtered = list(filter(key=lambda x: len(x) > 0, sub_elements))
        # then join the elements togheter and append to the final results array
        results.append(",".join(filtered))

uj5u.com热心网友回复：

在我看来，串列中每个元素的第一部分是第二部分的印地语翻译，单词数量之间存在一一对应关系。

因此，对于您提供的示例以及任何遵循完全相同模式的示例（如果不这样做，它将中断），您所要做的就是获取阵列每个元素的第二部分。

phrases = ["S/O ???? ?????, ????? ?.-4??, S/O Ashok Kumar, Block no.-4D.",
  "???????-15, ????? 5. ????? ????? Street-15, sector -5, Civic Centre",
  "?????, ?????, ?????, ?????????, Bhilai, Durg. Bhilai, Chhattisgarh,",]


mod_list = []
for s in list:
    tmp_list = []
    strg = s.split()
    n = len(strg)
    for i in range(int(n/2),n):
        tmp_list.append(strg[i])
    tmp_list = ' '.join(tmp_list)
    mod_list.append(tmp_list)

print(mod_list)

输出：

['S/O Ashok Kumar, Block no.-4D.', 
'Street-15, sector -5, Civic Centre', 
'Bhilai, Durg. Bhilai, Chhattisgarh,']

如何过滤字符串串列？

0 评论

发表评论

最新文章

斥350亿美元建新航厦，迪拜将打造世界最大机场

Windows系统安装最详细教程，基于U盘方式

十首精美绝伦的爱情宋词

分手后仍难以与前任断绝联系的三大星座，纠缠不清的情感纠葛！

优秀的女人，必须坚持的11个生活习惯！

此刻，像宋人一样热爱生活！

随机推荐

热门分类

热门标签

如何过滤字符串串列？

Blazor服务器：用于区分客户端的唯一ID

合并具有重复键的json阵列

0 评论

发表评论

最新文章

斥350亿美元建新航厦，迪拜将打造世界最大机场

Windows系统安装最详细教程，基于U盘方式

十首精美绝伦的爱情宋词

分手后仍难以与前任断绝联系的三大星座，纠缠不清的情感纠葛！

优秀的女人，必须坚持的11个生活习惯！

此刻，像宋人一样热爱生活！

随机推荐

热门分类

热门标签