Regexp Package

Go的regexp package是实现的Russ Cox's RE2正则表达式引擎 code.google.com/p/re2/。 RE2的优势在于执行非常快，同时也是线程安全的。 RE2引擎没有使用回朔（backtracking)，因此可以执行匹配的时间是线性的0(n), n为匹配字符串的长度。而支持回朔的引擎，则是 $0(2^{n})$ . 但没有回朔的代价是不支持反向引用。但可以通过好好使用regexp API，解决这个问题。

Table 3.12 为regexp package中的函数。包含四个创建 *regexp.Regexp值的函数。 RE2 引擎中支持的转义字符如Table3.13所示。字符类在表Table 3.14. zero-width assertions Table3.15. 数量词 Table3.16. flag Table3.17.

regexp.Regexp.ReplaceAll() 和 regexp.Regexp.ReplaceAllString() 支持数字替换，也支持命名替换。数字替换是$1则匹配第一个()捕换的分组。而采用名字时，则替换正则表达式中相应的(?P)中的名字分组。虽然替换可以用数字，也可以用名字。如($2, $filename). 但为了安全起见，还是应该使用{}作为分隔符（e.g., ${2}, ${filename}). 使用$$，则替换字符串(新字符)中可以包含$.

Table 3.12 The Regexp Package's Functions

变量p和s都是字符串类型, 但p是一个正则表达式字符串

Syntax	Description/result
regexp.Match(p, b)	true and nil if p matches b of type []byte
regexp.Match-Reader(p, r)	true and nil if p matches the text read by r of typeio.RuneReader
regexp.Match-String(p, s)	true and nil if p matches s
regexp.QuoteMeta(s)	对s中的正则元字符进行转义, QuoteMeta(`[foo]`) returns `\[foo\]`.
regexp.Compile(p)	A *regexp.Regexp and nil if p compiles successfully;
regexp.Compile-POSIX(p)	A *regexp.Regexp and nil if p compiles successfully;
regexp.Must-Compile(p)	A *regexp.Regexp if p compiles successfully, otherwise panics;
regexp.Must-CompilePOSIX(p)	A *regexp.Regexp if p compiles successfully, otherwise panics;

Table 3.13 The Regexp Package’s Escape Sequences

Syntax	Description
\c	Literal character c; e.g., * is a literal * rather than a quantifier
\000	Character with the given octal code point
\xHH	Character with the given 2-digit hexadecimal code point
\x{HHHH}	Character with the given 1–6-digit hexadecimal code point
\a	ASCII bell (BEL) ≡ \007
\f	ASCII formfeed (FF) ≡ \014
\n	ASCII linefeed (LF) ≡ \012
\r	ASCII carriage return (CR) ≡ \015
\t	ASCII tab (TAB) ≡ \011
\v	ASCII vertical tab (VT) ≡ \013
\Q...\E	Matches the ... text literally even if it contains characters like *

Table 3.14 The Regexp Package’s Character Classes

[chars]     Any character in chars, [abc]
[^chars]    Any character not inchars
[:name:]    [[:alnum:]] ≡ [0-9A-Za-z]     [[:lower:]] ≡ [a-z]
            [[:alpha:]] ≡ [A-Za-z]        [[:print:]] ≡ [ -~]
            [[:ascii:]] ≡ [\x00-\x7F]     [[:punct:]] ≡ [!-/:-@[-`{-~]
            [[:blank:]] ≡ [ \t]           [[:space:]] ≡ [ \t\n\v\f\r]
            [[:cntrl:]] ≡ [\x00-\x1F\x7F] [[:upper:]] ≡ [A-Z]
            [[:digit:]] ≡ [0-9]           [[:word:]]  ≡ [0-9A-Za-z_]
            [[:graph:]] ≡ [!-~]           [[:xdigit:]]≡ [0-9A-Fa-z]
[:^name:]   Any ASCII character not in the name character class
.           Any character (如果设置了s标志，则也包含\n)
\d          Any ASCII digit:[0-9]
\D          Any ASCII nondigit:[^0-9]
\s          Any ASCII whitespace:[ \t\n\f\r]
\S          Any ASCII nonwhitespace:[^ \t\n\f\r]
\w          [0-9A-Za-z_]
\W          [^0-9A-Za-z_]
\pN         Any Unicode character in the N one-letter character class
\PN         Any Unicode character not in the N one-letter character class;
\p{Name}    \p{Greek} matches Greek characters \p{Ll} matches lowercase letters
\P{Name}    Any Unicode character not in the Name character class

Table3.15 The Regexp Package’s Zero-Width Assertions

Syntax	Description/result
^	Start of text (or start of line if flag m is set)
$	End of text (or end of line if flag m is set)
\A	Start of text
\z	End of text
\b	Word boundary(\w followed by \W or \A or \z; or vice versa)
\B	Not a word boundary

Table3.16 The Regexp Package’s Quantifiers

Syntax	Description/result
e? or e{0,1}	Greedily match zero or one occurrence of expression e
e+ or e{1, }	1次或多次
e* or e{0, }	0次或多次
e{m,}	至少m次
e{,n}	至多n次
e{m,n}	至少m次，最多n次
e{m} or e{m}?	重复m次
e?? or e{0,1}?	非贪婪匹配，0或1, 更喜欢0
e+? or e{1,} ?	同上
e*? or e{0,} ?	同上
e{m,}?	同上
e{,n}?	同上
e{m,n}?	同上

Table3.17 The Regexp Package’s Flags and Groups

Syntax	Description
i	忽略大小写
m	^和$匹配每行的开始
s	让.可以匹配\n
U	贪婪与非贪婪
(?flags)	应用标志(?i), (?-i)
(?flags:e)	应用标志到表达式e)
(e)	分组与捕获
(?Pe)	分组与捕获，并expression e 取名name.
(?:e)	分组，但不捕获

一个经典的替换例子是，我们需要将forname1...fornameN surname(姓) 替换为 surname, forname1...formnameN的形式。这里我们看看如何通过使用regexp package实现此目的，同时处理重音符号以及non-English 字符.

nameRx := regexp.MustCompile(`(\pL+\.?(?:\s+\pL+\.?)*)\s+(\pL+)`)
for i := 0; i < len(names); i++ {
names[i] = nameRx.ReplaceAllString(names[i], "${2}, ${1}")
}

names变量是一个[]string类型，初始化的保存原始的名字，但每次循环后，保存修改后的名字.

此正则表达式匹配由单个或者多个空白分隔的fornames(名)，每一个forname由一个或者多个Unicode letters(\pL)组成，每个forname之后是一个句点"."(可选). 整个fornames之后是一个空白符和surname（由一个或者多个Unicode letters组成)

使用数字替换会造成维护时的问题。比如，我们要在中间插入一个新的捕获组，则现在调用的数字就不正确。所以我们可需要使用到命名.

nameRx := regexp.MustCompile(
    `(?P<fornames>\pL+\.?(?:\s+\pL+\.?)*)\s+(?P<surname>\pL+)`)
for i := 0; i < len(names); i++ {
    names[i] = nameRx.ReplaceAllString(names[i], "${surname}, ${fornames}")
}

这里我们对两个捕获的数组进行命名。它使用整个正则表达式，以及替换字符更容易理解。

用于匹配多个重复出现的"words", 依赖于反向引用。如Python 或 Perl中 \b(\w+)\s+\1\b. 由于GO不支持正则表达式的反向引用，为了完成相同的目的。如下所示

wordRx := regexp.MustCompile(`\w+`)
if matches := wordRx.FindAllString(text, -1); matches != nil {
    previous := ""
    for _, match := range matches {
        if match == previous {
        fmt.Println("Duplicate word:", match)
        }
    previous = match
    }
}

另一个经常使用到的例子是用正则表达式匹配,配置文件中的key:value。

valueForKey := make(map[string]string)
keyValueRx := regexp.MustCompile(`\s*([[:alpha:]]\w*)\s*:\s*(.+)`)
if matches := keyValueRx.FindAllStringSubmatch(lines, -1); matches != nil {
    for _, match := range matches {
        valueForKey[match[1]] = strings.TrimRight(match[2], "\t ")
    }
}

这个正则表达式告诉我们，跳过任何的前导空格(不捕获), 并且对于key的匹配，它必须是以英文字符开头，之后在跟0个或者多个字符，数字以及下划线"" (\w). 之后是一个可选的空白符，分号(:), 可选空白符。之后才是key:value中的value——"."代表的是任意字符，但不代表新行。顺便解释一下， [[::alpha::]]表示[A-Za-z] 如果我们需要支持Unicode keys (\pL[\pL\p{Nd}]*), 表示一个Unicode letter之后跟零个或者多个Unicode字符，数字，以及下划线。

regexp.Regexp.FindAllStringSubmatch()函数将返回一个由[]string组成的slice. 每个[]string中，第一个是完整匹配的字符，第二个是key(第一个分组), 第三个是(value).

虽然可以使用Go's xml.Decoder来解析XML。有时我们会简单的解析xml形式的属性，比如name="value", or name='value'. 为此，我们只需要一个简单的正则表达式就足够了

attrValueRx := regexp.MustCompile(regexp.QuoteMeta(attrName) +
`=(?:"([^"]+)"|'([^']+)')`)
if indexes := attrValueRx.FindAllStringSubmatchIndex(attribs, -1);
    indexes != nil {
    for _, positions := range indexes {
        start, end := positions[2], positions[3]
        if start == -1 {
        start, end = positions[4], positions[5]
       }
        fmt.Printf("'%s'\n", attribs[start:end])    
    }
}

attrValueRx 匹配一个转义过后的attribute name(特殊字符前面都会加上反斜线，比如fmt.Println(regexp.QuoteMeta("(?P:Hello) [a-z]")), $\?P:Hello$ \[a-z\]. 在属性值选择时，我们使用一个圆括号，以及"|"，提供可选。但这里我们并不需要捕获，只是为了使用可选，所以使用了((?:)). 为了展示它是如何完成的，我们没有直接返回捕获的字符串，而是使用index 位置。在这个例子中，通常是三对([start:end]) indexs, 第一对是整个的匹配，第二个是双引号中的值。第三个是单引号的值。 e.g., [[1 12 6 11 -1 -1] [13 22 18 21 -1 -1]].

对于每次匹配到的属性，都是一个[]int slice. 整个匹配为attribs[positions[0]:positions[1]]. 引号内的字符串为attribs[positions[2]:positions[3]] or attribs[positions4]:positions[5]]，这取决于是使用了单引号还是双引号。

之前我们写过一个SimplifyWhiteSpace()函数.

func SimpleSimplifyWhitespace(s string) string {
    return strings.Join(strings.Fields(strings.TrimSpace(s)), " ")
}

这里我们使用正则表达式和string.TrimSpace()完成相同的功能。

simplifyWhitespaceRx := regexp.MustCompile(`[\s\p{Zl}\p{Zp}]+`)
text = strings.TrimSpace(simplifyWhitespaceRx.ReplaceAllLiteralString(
text, " "))

regexp.Regexp.ReplaceAllLiteralString()函数，接受一个字符串，并替换掉匹配到的文本（跟ReplaceAllString()不同在于前， ReplaceAllString支持$). 这段代码将使用单个空格替换一个或者多个空白字符(ASCII 空白字符和Unicode 行或者段落分隔符）。

在我们关于正则表达式的最后一个例子，我们将使用一个替换函数来完成替换.

unaccentedLatin1Rx := regexp.MustCompile(
`[ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝàáâãäåæçèéêëìíîïñðòóôõöøùúûüýÿ]+`)
unaccented := unaccentedLatin1Rx.ReplaceAllStringFunc(latin1,
UnaccentedLatin1)

此正则表达式只是简间的区配一个或者多个带音标的拉丁字符母。Regexp.Regexp.ReplaceAllStringFunc()函数，在每次匹配到字符时，将这个字符传递给它的第二个参数 func(string) string(获得匹配到字符串，并返回替换后的字符串).

func UnaccentedLatin1(s string) string {
    chars := make([]rune, 0, len(s))
    for _, char := range s {
        switch char {
        case 'À', 'Á', 'Â', 'Ã', 'Ä', 'Å':
            char = 'A'
        case 'Æ':
            chars = append(chars, 'A')
        char = 'E'
        // ...
        case 'ý', 'ÿ':
            char = 'y'
        }
        chars = append(chars, char)
    }
    return string(chars)
}

API

Regexp package 提供了16个方法用来匹配正则表达式和标识一个匹配到的文本。它们的名字符合以下的规则

Find(All)?(String)?(Submatch)?(Index)?

如果存在All, 程序将匹配整个表达式，不会只在第一次匹配成功后终止。它会接受一个参数n, 如果n>=0，则返回最多n个匹配。
如果存在string, 则匹配的是string,而非[]bytes slice.
如果Submatch存在，则返回的值是一个包含一系列子区配。
Index，返回匹配到的字符串位置

func Match(pattern string, b []byte) (matched bool, err error)

Match checks whether a textual regular expression matches a byte slice. More complicated queries need to use Compile and the full Regexp interface.

fmt.Println(regexp.Match("H.* ", []byte("Hello World!")))
//true <nil>

func MatchReader(pattern string, r io.RuneReader) (matched bool, err error)

MatchReader checks whether a textual regular expression matches the text read by the RuneReader. More complicated queries need to use Compile and the full Regexp interface.

r := bytes.NewReader([]byte("Hello World!"))
fmt.Println(regexp.MatchReader("H.* ", r))
//true <nil>

func MatchString(pattern string, s string) (matched bool, err error)

matched, err := regexp.MatchString("foo.*", "seafood")
fmt.Println(matched, err)
matched, err = regexp.MatchString("bar.*", "seafood")
fmt.Println(matched, err)
matched, err = regexp.MatchString("a(b", "seafood")
fmt.Println(matched, err)

Output

true <nil>
false <nil>
false error parsing regexp: missing closing ): `a(b`

func QuoteMeta(s string) string

QuoteMeta returns a string that quotes all regular expression metacharacters inside the argument text; the returned string is a regular expression matching the literal text. For example, QuoteMeta(`[foo]`) returns `\[foo\]`.

fmt.Println(regexp.QuoteMeta("(?P:Hello) [a-z]"))
// \(\?P:Hello\) \[a-z\]

type Regexp

type Regexp struct {
// contains filtered or unexported fields
}

Regexp is the representation of a compiled regular expression. A Regexp is safe for concurrent use by multiple goroutines.

func Compile(expr string) (*Regexp, error)

Compile parses a regular expression and returns, if successful, a Regexp object that can be used to match against text.

When matching against text, the regexp returns a match that begins as early as possible in the input (leftmost), and among those it chooses the one that a backtracking search would have found first. This so-called leftmost-first matching is the same semantics that Perl, Python, and other implementations use, although this package implements it without the expense of backtracking. For POSIX leftmost-longest matching, see CompilePOSIX.

r, err := regexp.Compile(`Hello`)
if err != nil {
    fmt.Printf("There is a problem with your regexp.\n")
    return
}
// Will print 'Match'
if r.MatchString("Hello Regular Expression.") == true {
    fmt.Printf("Match")
} else {
    fmt.Printf("No match ")
}

func CompilePOSIX(expr string) (*Regexp, error)

CompilePOSIX is like Compile but restricts the regular expression to POSIX ERE (egrep) syntax and changes the match semantics to leftmost-longest.

That is, when matching against text, the regexp returns a match that begins as early as possible in the input (leftmost), and among those it chooses a match that is as long as possible. This so-called leftmost-longest matching is the same semantics that early regular expression implementations used and that POSIX specifies.

However, there can be multiple leftmost-longest matches, with different submatch choices, and here this package diverges from POSIX. Among the possible leftmost-longest matches, this package chooses the one that a backtracking search would have found first, while POSIX specifies that the match be chosen to maximize the length of the first subexpression, then the second, and so on from left to right. The POSIX rule is computationally prohibitive and not even well-defined. See http://swtch.com/~rsc/regexp/regexp2.html#posix for details.

reg, err := regexp.CompilePOSIX(`[[:word:]]+`)
fmt.Printf("%q,%v\n", reg.FindString("Hello World!"), err)
s := "ABCDEEEEE"
rr := regexp.MustCompile(`ABCDE{2}|ABCDE{4}`)
rp := regexp.MustCompilePOSIX(`ABCDE{2}|ABCDE{4}`) //leftmost-longest match
fmt.Println(rr.FindAllString(s, 2))
fmt.Println(rp.FindAllString(s, 2))

Output

"Hello",<nil>
[ABCDEE]    <- first acceptable match
[ABCDEEEE]  <- But POSIX wants the longer match

func MustCompile(str string) *Regexp

MustCompile is like Compile but panics if the expression cannot be parsed. It simplifies safe initialization of global variables holding compiled regular expressions.

if regexp.MustCompile(`Hello`).MatchString("Hello Regular Expression.") == true {
    fmt.Printf("Match ") //Match
} else {
    fmt.Printf("No match ")
}

var myre = regexp.MustCompile(`\d(+`)
fmt.Println(myre.MatchString("helloRegular Express."))

Output

panic: regexp: Compile(`\d(+`): error parsing regexp: missing argument to repetition operator: `+`
goroutine 1 [running]:
regexp.MustCompile(0x4de620, 0x4, 0x4148e8)
    go/src/pkg/regexp/regexp.go:207 +0x13f

func MustCompilePOSIX(str string) *Regexp

func (re *Regexp) Expand(dst []byte, template []byte, src []byte, match []int) []byte

Expand 将template添加到dst字节数组中。当然在添加前会先src获得对应的匹配，替换template中的变量。 match slice 应该为调用FindSubmatchIndex()返回的[]int.

在template中，一个可以使用$name或者${name}，也可以使用数字。在$name形式中， name采用的是最长模式，即$1x == ${1x}, 而不是${1}x. $10 == ${10}而不是${1}0.

为了插入纯字符，可以使用$$

src := []byte(`
src := []byte(`call hello alice
    hello bob
    call hello eve
    `)

pat := regexp.MustCompile(`(?m)(call)\s+(?P<cmd>\w+)\s+(?P<arg>.+)\s*$`)
res := []byte{}
fmt.Println(pat.FindAllSubmatchIndex(src, -1))
for _, s := range pat.FindAllSubmatchIndex(src, -1) {
    res = pat.Expand(res, []byte("$cmd('$arg')\n"), src, s)
}
fmt.Println(string(res))
/*
[[0 16 0 4 5 10 11 16] [35 54 35 39 40 45 46 49]]
hello('alice')
hello('eve')//如果不带(?m)则只输出此行
*/

func (re *Regexp) ExpandString(dst []byte, template string, src string, match []int) []byte

func (re *Regexp) Find(b []byte) []byte

Find returns a slice holding the text of the leftmost match in b of the regular expression. A return value of nil indicates no match.

reg := regexp.MustCompile(`\w+`)
fmt.Printf("%v", reg.Find([]byte("Hello World!")))
//[72 101 108 108 111]

func (re *Regexp) FindAll(b []byte, n int) [][]byte

FindAll is the 'All' version of Find; it returns a slice of all successive matches of the expression, as defined by the 'All' description in the package comment. A return value of nil indicates no match.

func (re *Regexp) FindAllIndex(b []byte, n int) [][]int

FindAllIndex is the 'All' version of FindIndex; it returns a slice of all successive matches of the expression, as defined by the 'All' description in the package comment. A return value of nil indicates no match.

func (re *Regexp) FindAllString(s string, n int) []string

FindAllString is the 'All' version of FindString; it returns a slice of all successive matches of the expression, as defined by the 'All' description in the package comment. A return value of nil indicates no match.

re := regexp.MustCompile("a.")
fmt.Println(re.FindAllString("paranormal", -1))
fmt.Println(re.FindAllString("paranormal", 2))
fmt.Println(re.FindAllString("graal", -1))
fmt.Println(re.FindAllString("none", -1))
/*
[ar an al]
[ar an]
[aa]
[]
*/

func (re *Regexp) FindAllStringIndex(s string, n int) [][]int

func (re *Regexp) FindAllStringSubmatch(s string, n int) [][]string

FindAllStringSubmatch is the 'All' version of FindStringSubmatch; it returns a slice of all successive matches of the expression, as defined by the 'All' description in the package comment. A return value of nil indicates no match.

re := regexp.MustCompile("a(x*)b")
fmt.Printf("%q\n", re.FindAllStringSubmatch("-ab-", -1))
fmt.Printf("%q\n", re.FindAllStringSubmatch("-axxb-", -1))
fmt.Printf("%q\n", re.FindAllStringSubmatch("-ab-axb-", -1))
fmt.Printf("%q\n", re.FindAllStringSubmatch("-axxb-ab-", -1))
/*
[["ab" ""]]
[["axxb" "xx"]]
[["ab" ""] ["axb" "x"]]
[["axxb" "xx"] ["ab" ""]]
*/

func (re *Regexp) FindAllStringSubmatchIndex(s string, n int) [][]int

re := regexp.MustCompile("a(x*)b")
// Indices:
//    01234567   012345678
//    -ab-axb-   -axxb-ab-
fmt.Println(re.FindAllStringSubmatchIndex("-ab-", -1))
fmt.Println(re.FindAllStringSubmatchIndex("-axxb-", -1))
fmt.Println(re.FindAllStringSubmatchIndex("-ab-axb-", -1))
fmt.Println(re.FindAllStringSubmatchIndex("-axxb-ab-", -1))
fmt.Println(re.FindAllStringSubmatchIndex("-foo-", -1))
/*
[[1 3 2 2]]
[[1 5 2 4]]
[[1 3 2 2] [4 7 5 6]]
[[1 5 2 4] [6 8 7 7]]
[]
*/

func (re *Regexp) FindAllSubmatch(b []byte, n int) [][][]byte

FindAllSubmatch is the 'All' version of FindSubmatch; it returns a slice of all successive matches of the expression, as defined by the 'All' description in the package comment. A return value of nil indicates no match.

func (re *Regexp) FindAllSubmatchIndex(b []byte, n int) [][]int

func (re *Regexp) FindIndex(b []byte) (loc []int)

func (re *Regexp) FindReaderIndex(r io.RuneReader) (loc []int)

func (re *Regexp) FindReaderSubmatchIndex(r io.RuneReader) []int

func (re *Regexp) FindString(s string) string

FindString returns a string holding the text of the leftmost match in s of the regular expression. If there is no match, the return value is an empty string, but it will also be empty if the regular expression successfully matches an empty string. Use FindStringIndex or FindStringSubmatch if it is necessary to distinguish these cases.

re := regexp.MustCompile("fo.?")
fmt.Printf("%q\n", re.FindString("seafood"))
fmt.Printf("%q\n", re.FindString("meat"))
/*
"foo"
""
*/

func (re *Regexp) FindStringIndex(s string) (loc []int)

FindStringIndex returns a two-element slice of integers defining the location of the leftmost match in s of the regular expression. The match itself is at s[loc[0]:loc[1]]. A return value of nil indicates no match.

s := "All is well that ends well"
//    012345678901234567890123456
//              1         2
r, err := regexp.Compile(`well$`)
fmt.Printf("%v", r.FindStringIndex(s)) // Prints [22 26]

r, err = regexp.Compile(`well`)
fmt.Printf("%v ", r.MatchString(s)) // true, but matches with first
                        // occurrence of 'well'
fmt.Printf("%v", r.FindStringIndex(s)) // Prints [7 11], the match starts at 7 and end before 11.

r, err = regexp.Compile(`ends$`)
fmt.Printf("%v ", r.MatchString(s)) // false, not at end of line.

s := "How much wood would a woodchuck chuck in Hollywood?"
//    012345678901234567890123456789012345678901234567890
//              10        20        30        40        50
//             -1--         -2--                    -3--
// Find words that *start* with wood
r, err := regexp.Compile(`\bwood`)              //    1      2
fmt.Printf("%v", r.FindAllStringIndex(s, -1)) // [[9 13] [22 26]]

// Find words that *end* with wood
r, err = regexp.Compile(`wood\b`)               //   1      3 
fmt.Printf("%v", r.FindAllStringIndex(s, -1)) // [[9 13] [46 50]]

// Find words that *start* and *end* with wood
r, err = regexp.Compile(`\bwood\b`)             //   1
fmt.Printf("%v", r.FindAllStringIndex(s, -1)) // [[9 13]]

func (re *Regexp) FindStringSubmatch(s string) []string

FindStringSubmatch returns a slice of strings holding the text of the leftmost match of the regular expression in s and the matches, if any, of its subexpressions, as defined by the 'Submatch' description in the package comment. A return value of nil indicates no match.

re := regexp.MustCompile("a(x*)b(y|z)c")
fmt.Printf("%q\n", re.FindStringSubmatch("-axxxbyc-"))
fmt.Printf("%q\n", re.FindStringSubmatch("-abzc-"))
/*
["axxxbyc" "xxx" "y"]
["abzc" "" "z"]
*/

func (re *Regexp) FindStringSubmatchIndex(s string) []int

FindStringSubmatchIndex returns a slice holding the index pairs identifying the leftmost match of the regular expression in s and the matches, if any, of its subexpressions, as defined by the 'Submatch' and 'Index' descriptions in the package comment. A return value of nil indicates no match.

s := "a001bcccc, a001b, a001b"
//    012345678901234567890123
/               10        20  
r, _ := regexp.Compile(`a(\d+)(\w+)`)
fmt.Printf("%v\n",r.FindStringSubmatchIndex(s))
fmt.Printf("%v\n",r.FindAllStringSubmatchIndex(s,-1))
/*
[0 9 1 4 4 9]
[[0 9 1 4 4 9] [11 16 12 15 15 16] [18 23 19 22 22 23]]
*/

func (re *Regexp) FindSubmatch(b []byte) [][]byte

func (re *Regexp) FindSubmatchIndex(b []byte) []int

func (re *Regexp) LiteralPrefix() (prefix string, complete bool)

LiteralPrefix 返回一个纯开头的字符串(literal string)。如果正则表达式整个都是literal string.则返回true.

reg := regexp.MustCompile(`Hello[\w\s]+`)
fmt.Println(reg.LiteralPrefix())
// Hello false
reg = regexp.MustCompile(`Hello`)
fmt.Println(reg.LiteralPrefix())
// Hello true

func (re *Regexp) Longest()

Longest使得之后的匹配采用leftmost-longest

text := `Hello World, 123 Go!`
pattern := `(?U)H[\w\s]+o` // 正则标记“非贪婪模式”(?U)
reg := regexp.MustCompile(pattern)
fmt.Printf("%q\n", reg.FindString(text))
// Hello
reg.Longest() // 切换到“贪婪模式”
fmt.Printf("%q\n", reg.FindString(text))
// Hello Wo

func (re *Regexp) Match(b []byte) bool

func (re *Regexp) MatchReader(r io.RuneReader) bool

func (re *Regexp) MatchString(s string) bool

MatchString reports whether the Regexp matches the string s.

func (re *Regexp) NumSubexp() int

NumSubexp returns the number of parenthesized subexpressions in this Regexp.

re := regexp.MustCompile("(?P<first_char>.)(?P<middle_part>.*)(?P<last_char>.)")
n1 := re.SubexpNames()
n2 := re.NumSubexp()
fmt.Printf("%#v\n",n1)
fmt.Printf("%#v", n2)
/*
[]string{"", "first_char", "middle_part", "last_char"}
*/

func (re *Regexp) ReplaceAll(src, repl []byte) []byte

ReplaceAll returns a copy of src, replacing matches of the Regexp with the replacement text repl. Inside repl, $ signs are interpreted as in Expand, so for instance $1 represents the text of the first submatch.

func (re *Regexp) ReplaceAllFunc(src []byte, repl func([]byte) []byte) []byte

ReplaceAllFunc returns a copy of src in which all matches of the Regexp have been replaced by the return value of function repl applied to the matched byte slice. The replacement returned by repl is substituted directly, without using Expand.

func (re *Regexp) ReplaceAllLiteral(src, repl []byte) []byte

ReplaceAllLiteral returns a copy of src, replacing matches of the Regexp with the replacement bytes repl. The replacement repl is substituted directly, without using Expand.

func (re *Regexp) ReplaceAllLiteralString(src, repl string) string

ReplaceAllLiteralString returns a copy of src, replacing matches of the Regexp with the replacement string repl. The replacement repl is substituted directly, without using Expand.

re := regexp.MustCompile("a(x*)b")
fmt.Println(re.ReplaceAllLiteralString("-ab-axxb-", "T"))
fmt.Println(re.ReplaceAllLiteralString("-ab-axxb-", "$1"))
fmt.Println(re.ReplaceAllLiteralString("-ab-axxb-", "${1}"))
/*
-T-T-
-$1-$1-
-${1}-${1}-
*/

func (re *Regexp) ReplaceAllString(src, repl string) string

ReplaceAllString returns a copy of src, replacing matches of the Regexp with the replacement string repl. Inside repl, $ signs are interpreted as in Expand, so for instance $1 represents the text of the first submatch.

re := regexp.MustCompile("a(x*)b")
fmt.Println(re.ReplaceAllString("-ab-axxb-", "T"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1W"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "${1}W"))
/*
-T-T-
--xx-
---
-W-xxW-
*/

func (re *Regexp) ReplaceAllStringFunc(src string, repl func(string) string) string

ReplaceAllStringFunc returns a copy of src in which all matches of the Regexp have been replaced by the return value of function repl applied to the matched substring. The replacement returned by repl is substituted directly, without using Expand.

unaccentedLatin1Rx := regexp.MustCompile(
`[ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝàáâãäåæçèéêëìíîïñðòóôõöøùúûüýÿ]+`)
unaccented := unaccentedLatin1Rx.ReplaceAllStringFunc(latin1,
UnaccentedLatin1)
func UnaccentedLatin1(s string) string {
    chars := make([]rune, 0, len(s))
    for _, char := range s {
        switch char {
        case 'À', 'Á', 'Â', 'Ã', 'Ä', 'Å':
        char = 'A'
        case 'Æ':
        chars = append(chars, 'A')
        char = 'E'
        // ...
        case 'ý', 'ÿ':
        char = 'y'
        }
        chars = append(chars, char)
    }
    return string(chars)
}

func (re *Regexp) Split(s string, n int) []string

Split()通过re中匹配到的字符串，作为分隔符，分割字符串, 匹配到的字符串不会包含在返回的slice中。

如果Regexp不包含metacharacters(\w,\d,*, +)等，regexp为字面量字符，则此函数等同于strings.SplitN.

s := regexp.MustCompile("a*").Split("abaabaccadaaae", 5)
// s: ["", "b", "b", "c", "cadaaae"]

n > 0: at most n substrings; the last substring will be the unsplit remainder.
n == 0: the result is nil (zero substrings)
n < 0: all substrings

func (re *Regexp) String() string

返回编译成正则表达式时的字符串

func (re *Regexp) SubexpNames() []string

返回正则表达式中已命名的分组。第一个分组的名字可以通过names[1]获得。假设m是匹配成功后返回的slice. 则m[i]的名字是SubexpNames()[i]. 由于只有子表达式可以被命名，而整个表达式是不可以被命名，所以names[0]为空字符串。

re := regexp.MustCompile("(?P<first>[a-zA-Z]+) (?P<last>[a-zA-Z]+)")
fmt.Println(re.MatchString("Alan Turing"))
fmt.Printf("%q\n", re.SubexpNames())
reversed := fmt.Sprintf("${%s} ${%s}", re.SubexpNames()[2], re.SubexpNames()[1])
fmt.Println(reversed)
fmt.Println(re.ReplaceAllString("Alan Turing", reversed))
/*
true
["" "first" "last"]
${last} ${first}
Turing Alan
*/

re := regexp.MustCompile("(?P<first_char>.)(?P<middle_part>.*)(?P<last_char>.)")
n1 := re.SubexpNames()
r2 := re.FindAllStringSubmatch("Super", -1)[0]
md :=map[string]string{}
for i, n := range r2 {
    fmt.Printf("%d. match='%s'\tname='%s'\n", i, n, n1[i])
    md[n1[i]] = n
}
fmt.Printf("The names are: %#v\n", n1)
fmt.Printf("The matches are: %v\n", r2)
fmt.Printf("The first character is %s\n", md["first_char"])
fmt.Printf("The last character is %s\n", md["last_char"])

Output

0. match='Super'    name=''
1. match='S'    name='first_char'
2. match='upe'    name='middle_part'
3. match='r'    name='last_char'
The names are: []string{"", "first_char", "middle_part", "last_char"}
The matches are: [Super S upe r]
The first character is S
The last character is r