字符串匹配和搜索_问答-阿里云开发者社区

如果你想匹配的是字面字符串，那么你通常只需要调用基本字符串方法就行，比如 str.find() , str.endswith() , str.startswith() 或者类似的方法：

text = 'yeah, but no, but yeah, but no, but yeah'

Exact match

text == 'yeah' False

Match at start or end

text.startswith('yeah') True text.endswith('no') False

Search for the location of the first occurrence

text.find('no') 10

对于复杂的匹配需要使用正则表达式和 re 模块。为了解释正则表达式的基本原理，假设你想匹配数字格式的日期字符串比如 11/27/2012 ，你可以这样做：

text1 = '11/27/2012' text2 = 'Nov 27, 2012'

import re

Simple matching: \d+ means match one or more digits

if re.match(r'\d+/\d+/\d+', text1): ... print('yes') ... else: ... print('no') ... yes if re.match(r'\d+/\d+/\d+', text2): ... print('yes') ... else: ... print('no') ... no

如果你想使用同一个模式去做多次匹配，你应该先将模式字符串预编译为模式对象。比如：

datepat = re.compile(r'\d+/\d+/\d+') if datepat.match(text1): ... print('yes') ... else: ... print('no') ... yes if datepat.match(text2): ... print('yes') ... else: ... print('no') ... no

match() 总是从字符串开始去匹配，如果你想查找字符串任意部分的模式出现位置，使用 findall() 方法去代替。比如：

text = 'Today is 11/27/2012. PyCon starts 3/13/2013.' datepat.findall(text) ['11/27/2012', '3/13/2013']

在定义正则式的时候，通常会利用括号去捕获分组。比如：

datepat = re.compile(r'(\d+)/(\d+)/(\d+)')

捕获分组可以使得后面的处理更加简单，因为可以分别将每个组的内容提取出来。比如：

m = datepat.match('11/27/2012') m <_sre.SRE_Match object at 0x1005d2750>

Extract the contents of each group

m.group(0) '11/27/2012' m.group(1) '11' m.group(2) '27' m.group(3) '2012' m.groups() ('11', '27', '2012') month, day, year = m.groups()

Find all matches (notice splitting into tuples)

text 'Today is 11/27/2012. PyCon starts 3/13/2013.' datepat.findall(text) [('11', '27', '2012'), ('3', '13', '2013')] for month, day, year in datepat.findall(text): ... print('{}-{}-{}'.format(year, month, day)) ... 2012-11-27 2013-3-13

findall() 方法会搜索文本并以列表形式返回所有的匹配。如果你想以迭代方式返回匹配，可以使用 finditer() 方法来代替，比如：

for m in datepat.finditer(text): ... print(m.groups()) ... ('11', '27', '2012') ('3', '13', '2013')

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

字符串匹配和搜索

Exact match

Match at start or end

Search for the location of the first occurrence

Simple matching: \d+ means match one or more digits

Extract the contents of each group

Find all matches (notice splitting into tuples)