Really advanced perl RegEx reference


* Samples
Swap two item
s/(\S+)\s+(\S+)/$2 $1/

Search C identifier
m/[_A-Za-z][_A-Za-z0-9]*/
m/[_[:alpha:]][_[:alnum:]]*/

Empty Line
/^$/

Word
\b\w+\b

* Questions

* Reference
perlre (bytes and utf8)
regex.h (regcomp regexec regfree regerror) (single byte only)
java (unicode only)
python (bytes and unicode)

* Basic Structure

* Syntax
m/regex/ismx
s/regex/replacement/ismxg

* Flags
i case-insensitive
s single-line or dot-match-all (only affects .)
m multi-line (only ^ $)
x allows space and comment (perl specific)
g global subsitution

* Alternations
m/ABC|XYZ/

* Sequence
m/ABC/

* Repeatition
(agressive)
A = a? 0 or 1
a* 0 or more
a+ 1 or more
a{m} m
a{m,} m or more
a{m,n} m to n (inclusively)

(lazy)
a??
a*?
a+?
a{m}?
a{m,}?
a{m,n}?

aa
(a?)(a*) $1 => a a
(a??)(a*) $1 => "" aa

* Atoms
Character = a b c
Character Class
Escape = \ + non-alpha, such as \\, \+, \(, except reference
Meta Escape= \ + alpha[a-zA-Z]
Groups = (...)

* Character Class
[abc] [a-b] [^abc] [^abc0-9]
[- and [] are considered literal
[-a] = - or a
[^\-]

[[]
[]]
[ ]

* Posix Character Class
[[.a.]] collation
[[=a=]] equivalence
[[:alpha:]]

* Meta
. anything except newlines (normal mode)
. anything (s mode, singleline, dotall)
^ start of string, or start of line (m mode)
$ end of string (including newline), or end of line (m mode)

* Meta Escape
\t \n \r \f \a \e
\0nn \xnn
\cA (using algorithm ch ^ 0x40)
\cM
\N{name}
\l lowercase next char
\u uppercase next char
\L...\E lowercase until \E
\U...\E uppercase until \E
\Q...\E quote until \E
\w \W word char
\s \S space
\d \D digit
\b \B boundary
\p{property}
\P{property}
\X combining character sequence
\C single byte (perl)
\< start of word (emacs)
\> end of word (emacs)

* Groups
(abc) for capture group

* Special group
(?#comment)
(?imsx-imsx) embedded flags
(?:pattern) for non-capture
(?imsx-imsx:pattern) subpattern
(?=pattern) positive look ahead
(?!pattern) negative look ahead
(?<=pattern) positive look behind
(?

* Reference for capture
m/(x)\1/
s/(x)/$1$1/

* Traditional vs Extended
\{m,n\} vs {m,n}
\(xxx\) vs (xxx)
Emacs is still using traditional regular expression

* Special extension
\< start of word (emacs)
\> end of word (emacs)

* New Lines

\n \v \r \r\n \f \x85 \x2028 \x2029 \x1A

* Samples
Swap two item
s/(\S+)\s+(\S+)/$2 $1/

Search C identifier
m/[_A-Za-z][_A-Za-z0-9]*/
m/[_[:alpha:]][_[:alnum:]]*/

Empty Line
/^$/

Word
\b\w+\b

* Questions

* Reference
perlre (bytes and utf8)
regex.h (regcomp regexec regfree regerror) (single byte only)
java (unicode only)
python (bytes and unicode)

作者:车东 发表于:2004-09-20 22:09 最后更新于:2007-04-15 19:04
版权声明:可以转载,转载时请务必以超链接形式标明文章 的原始出处和作者信息及本版权声明

发表一个评论

(如果你此前从未在此 Blog 上发表过评论,则你的评论必须在 Blog 主人验证后才能显示,请你耐心等候。)

相关文章

关于

此页面包含了发表于2004年09月20日 晚上10时28分的 Blog 上的单篇日记。

此 Blog 的前一篇日记是 Flickr的网络收藏夹服务

此 Blog 的后一篇日记是 BBS逐渐在Blog化

更多信息可在 主索引 页和 归档 页看到。

Creative Commons License
此 Blog 中的日记遵循以下授权 Creative Commons(创作共用)授权.
Powered by
Movable Type 3.36