import re
# compile re `ab*` and return pattern obj `p`
p = re.compile('ab*')
Method | Object |
---|---|
match() | Return match obj by searching from very beginning one by one, return None if not found |
search() | Return match obj in all string, return None if not found |
findall() | Return substring list that matches w/ re |
finditer() | Return substring iterable that matches w/ re |
Following example uses pattern object from this:
import re
p = re.compile('[a-z]+')
m = p.match("python")
print(m) # <_sre.SRE_Match object at 0x01F3F9F8>
# "python" matches w/ [a-z]+
m = p.match("3 python")
print(m) # None
# "3" is not matching w/ [a-z]+, thus, return None
General template for match()
method due to 2 return types:
p = re.compile(reg_exp)
m = p.match( 'string goes here' )
if m:
print('Match found: ', m.group())
else:
print('No match')
Return same match obj w/ match()
m = p.search("python")
print(m) # <_sre.SRE_Match object at 0x01F3FA68>
Returns match obj since it searches entire string.
m = p.search("3 python")
print(m) # <_sre.SRE_Match object at 0x01F3FA30>
Notice the difference.
match() start from beginning of string, while search() entire string
Returns list grouped by word
result = p.findall("life is too short")
print(result) #['life', 'is', 'too', 'short']
Returns iterable. Each element in iterable is match obj.
result = p.finditer("life is too short")
print(result) # <callable_iterator object at 0x01F5E390>
for r in result: print(r)
...
<_sre.SRE_Match object at 0x01F3F9F8>
<_sre.SRE_Match object at 0x01F3FAD8>
<_sre.SRE_Match object at 0x01F3FAA0>
<_sre.SRE_Match object at 0x01F3F9F8>
Method | Role |
---|---|
group() | Return matched string |
start() | Return matched string’s start index |
end() | Return matched string’s end index |
span() | Return matched string’s (start, end) tuple |
match()
and search()
m = p.match("python")
m.group() # 'python'
m.start() # 0 : Always 0 for match()
m.end() # 6
m.span() # (0, 6)
m = p.search("3 python")
m.group() # 'python'
m.start() # 2
m.end() # 8
m.span() # (2, 8)
Compile by module
Use separate compilation:
p = re.compile('[a-z]+', re.I)
m = p.match("python")
Otherwise, compile and match at once is easier and faster.
m = re.match('[a-z]+', "python")
Compile options
Full Option | shortcut | Role |
---|---|---|
DOTALL | S | . includes \n -> include every characters |
IGNORECASE | I | Ignore case |
MULTILINE | M | Match multiple lines (^ , $ utilizable this option ) |
VERBOSE | X | Allow verbose mode. (Allow easier view and commentable) |
example: re.DOTALL
== re.S
\n
).w/o option:
import re
m = re.match('a.b', 'a\nb')
print(m) # None since . ignores \n
w/ option:
import re
p = re.compile('a.b', re.DOTALL)
m = p.match('a\nb')
print(m) # <_sre.SRE_Match object at 0x01FCF3D8>
p = re.compile('[a-z]', re.I)
p.match('python') # <_sre.SRE_Match object at 0x01FCFA30>
p.match('Python') # <_sre.SRE_Match object at 0x01FCFA68>
p.match('PYTHON') # <_sre.SRE_Match object at 0x01FCF9F8>
^
, $
meta charactersimport re
p = re.compile("^python\s\w+")
data = """python one
life is too short
python two
you need python
python three"""
print(p.findall(data)) # ['python one']
Use re.M
if you want to use ^
for each line’s first, not first for entire string.
import re
p = re.compile("^python\s\w+", re.MULTILINE)
data = """python one
life is too short
python two
you need python
python three"""
print(p.findall(data)) # ['python one', 'python two', 'python three']
charref = re.compile(r'&[#](0[0-7]+|[0-9]+|x[0-9a-fA-F]+);')
w/ re.X:
charref = re.compile(r"""
&[#] # Start of a numeric entity reference
(
0[0-7]+ # Octal form
| [0-9]+ # Decimal form
| x[0-9a-fA-F]+ # Hexadecimal form
)
; # Trailing semicolon
""", re.VERBOSE)
[ ]
).#
to comment for each line\
Escape for meta characters
\section
: \s
recognized white space == [ \t\n\r\f\v]ection
–> Not intended
\\section
: recognized as \section
\\
becomes \
: same error due to Python’s literal rules.
Unix’s grep, vim works fine
p = re.compile('\\section')
Using 4 \
is complicated.
p = re.compile('\\\\section')
To resolve this issue, Python offers Raw String rule as shown below:
p = re.compile(r'\\section')
If RE don’t have
\
, it will be the same RE w/ or w/o raw string indicatorr
Reference: https://wikidocs.net/4308