The Strings Package

字符串处理中一个常见的操作是将一个字符串分割成一个不同的子字符串组成的slice. 以便之后的处理——比如将slice中的字符串转换为数组,或者trim 空白符。

为了更好的了解如何使用strings package中的函数,我们重新回顾一个很小的例子,看看如何使用这些函数。 所有的函数请参考3.6和3.7。 让我们从分割一个字符串开始讲解

names := "Niccolò•Noël•Geoffrey•Amélie••Turlough•José"
fmt.Print("|")
for _, name := range strings.Split(names, "•") {
fmt.Printf("%s|", name)
}
fmt.Println()
// |Niccolò|Noël|Geoffrey|Amélie||Turlough|José|

代码的第一句是一个以项目符号分隔的字符串(包含一个空白字段). 我们通过strings.Split()分割这个字符串,这个函数接受一个要分割的字符串和用于分隔符,并且是分隔出所有的字符串(如果需要限定数量,则需要使用到string.SplitN()代替). 如果我们使用了 strings.SplitAfter(). 则输出如下所示

|Niccolò•|Noël•|Geoffrey•|Amélie•|•|Turlough•|José|

strings.SplitAfter()函数跟string.Split()函数一样,但它保留了分隔符,同样 strings.SplitAfterN()限定我们需要分割的次数。

如果我们需要使用多个不同分隔符,我们可以使用strings.FeildFunc()

for _, record := range []string{"László Lajtha*1892*1963",
    "Édouard Lalo\t1823\t1892", "José Ángel Lamas|1775|1814"} {
    fmt.Println(strings.FieldsFunc(record, func(char rune) bool {
        switch char {
        case '\t', '*', '|':
        return true
    }
    return false
    }))
}

strings.Field函数接受一个字符串(record变量) 和一个func(rune) bool签名的函数引用。 由于这个函数很小,同时也只用于这个地方,所以我们使用了一个匿名函数。 strings.FeildsFunc()函数会遍历record字符串中的每一个字符。 并将这个字符传递给第二个函数。如果第二个函数返回true. 则对字符串进行一次分割。 这里我们看到的分隔符为 tabs, *, |

我们可以使用strings.Replace()函数替换掉字符串中出现的字符。

names = " Antônio\tAndré\tFriedrich\t\t\tJean\t\tÉlisabeth\tIsabella \t"
names = strings.Replace(names, "\t", " ", -1)
fmt.Printf("|%s|\n", names)
// |·Antônio·André··Friedrich···Jean··Élisabeth·Isabella··|

strings.Replace()函数接受一个要替换的字符串, old子串(要被替换的子串), new string, 以及替换的次数(这里是-1, 表示替换全部)。

当我们读取的一个字符串来自于用户的输入或者外部的源文件。 我们需要去除掉一些前后空白。

fmt.Printf("|%s|\n", SimpleSimplifyWhitespace(names))
func SimpleSimplifyWhitespace(s string) string {
    return strings.Join(strings.Fields(strings.TrimSpace(s)), " ")
}
//|Antônio·André·Friedrich·Jean·Élisabeth·Isabella|

strings.TrimSpace()函数返回一个复制的字符串,这个字符串被去除掉了前后空白. strings.Feilds()函数用任意数量的空白分隔字符串(字符串中每一项由一个或者多个空白分隔),并返回一个[]string. strings.Join()函数接受一个[]string. 和一个分隔符(可以为空字符串), 并返回组合后的字符串。

当然,我们也可以使用更加高效的方法

func SimplifyWhitespace(s string) string {
    var buffer bytes.Buffer
    skip := true
    for _, char := range s {
        if unicode.IsSpace(char) {
            if !skip {//前一个字符只能为非空白
                buffer.WriteRune(' ')
                skip = true
            }
        } else {
            buffer.WriteRune(char)
            skip = false
        }
    }
    s = buffer.String()
    if skip && len(s) > 0 {//如果句尾是一个空白
        s = s[:len(s)-1]
    }   
    return s
}

strings.Map()函数可以用于替换或者删除字符。 它接收两个参数,第一个是用于mapping的函数,函数签名为 func(rune) rune. 第二个参数为一个字符串。 mapping function在每个字符遍历时调用。 每一个字符的替换都是通过mapping functio的返回,如果mapping function返回一个负数,则这个字符被删除.

asciiOnly := func(char rune) rune {
    if char > 127 { 
        return '?'
    }
    return char
}
fmt.Println(strings.Map(asciiOnly, "Jérôme Österreich"))
// J?r?me·?sterreich

我们也可以直接删除non-ASCII字符,只需要使用 return -1 代替 return '?'.

Jrme·sterreich

我们之前使用for ... range loop来遍历一个字符串。同样也可以通过使用ReadRune()函数实现,只要这个类型实现了ReadRune()函数, 比如bufio.Reader类型

reader := strings.NewReader("Café")
for {
    char, size, err := reader.ReadRune()
    if err != nil { // might occur if the reader is reading a file
        if err == io.EOF { // finished without incident
            break
        }   
        panic(err) // a problem occurred
    }
    fmt.Printf("%U '%c' %d: % X\n", char, char, size, []byte(string(char)))
}

Output

U+0043·'C'·1:·43
U+0061·'a'·1:·61
U+0066·'f'·1:·66
U+00E9·'é'·2:·C3·A9

这段代码读取一个字符串,并且输出每一个字符的字符码,字符,字符有多少个字节,以及表示字符的字节。 大部分情况下,reader是操作于file的。 所以这里我们可以假设reader变量是通过调用bufio.NewReader()创建的。 但是在这里,reader是操作一个字符串,我们通过以下方式进行创建

reader := strings.NewReader("Café")

strings.NewReader 返回一个*strings.Reader指针. 它提供了部分的bufio.Reader中的函数。 如strings.Reader.Read(), strings.Reader.Reader.ReadByte(), strings.Reader.ReadRune(), strings.Reader.UnreadByte(), strings.Reader.UnreadRune()方法。

API

func Contains(s, substr string) bool

Contains returns true if substr is within s.

fmt.Println(strings.Contains("seafood", "foo")) //true
fmt.Println(strings.Contains("seafood", "bar"))  //false
fmt.Println(strings.Contains("seafood", "")) //true
fmt.Println(strings.Contains("", "")) //true

func ContainsAny(s, chars string) bool

ContainsAny returns true if any Unicode code points in chars are within s.

fmt.Println(strings.ContainsAny("team", "i"))  //false
fmt.Println(strings.ContainsAny("failure", "u & i"))  //true
fmt.Println(strings.ContainsAny("foo", "")) //false
fmt.Println(strings.ContainsAny("", "")) //false

func ContainsRune(s string, r rune) bool

ContainsRune returns true if the Unicode code point r is within s.

func Count(s, sep string) int

计算s中sep的个数

fmt.Println(strings.Count("cheese", "e")) //3
fmt.Println(strings.Count("five", "")) //5 before & after each rune

func EqualFold(s, t string) bool

忽略大写小比较s与t是否相等。

fmt.Println(strings.EqualFold("Go", "go")) //true

func Fields(s string) []string

通过一个或者多个连续的空白符(符合unicode.IsSpace定义), 分隔一个字符串. 并返回一个数组。 如果返回的是一个空的数组,则表示s只包含空白符

fmt.Printf("Fields are: %q", strings.Fields("  foo bar  baz   "))
//Fields are: ["foo" "bar" "baz"]

func FieldsFunc(s string, f func(rune) bool) []string

FieldsFunc会遍历每个字符,并且将字符传入到f函数,如果对这个字符符合要求,则f返回true, 则s在这个位置分割一次。如果把有的代码点都符合f(c)或者string为空,则返回一个空的数组。

f := func(c rune) bool {
    return !unicode.IsLetter(c) && !unicode.IsNumber(c)
}
fmt.Printf("Fields are: %q", strings.FieldsFunc("  foo1;bar2,baz3...", f))
//Fields are: ["foo1" "bar2" "baz3"]

func HasPrefix(s, prefix string) bool

HasPrefix tests whether the string s begins with prefix.

func HasPrefix(s, prefix string) bool

HasSuffix tests whether the string s ends with suffix.

//检测输入的文件是以.txt为后缀
strings.HasSuffix(os.Args[1], ".txt")

func Index(s, sep string) int

Index returns the index of the first instance of sep in s, or -1 if sep is not present in s.

fmt.Println(strings.Index("chicken", "ken")) //4
fmt.Println(strings.Index("chicken", "dmr")) //-1
line := "røde og gule sløjfer"
i := strings.Index(line, " ") // Get the index of the first space
firstWord := line[:i] // Slice up to the first space
j := strings.LastIndex(line, " ") // Get the index of the last space
lastWord := line[j+1:] // Slice from after the last space
fmt.Println(firstWord, lastWord) // Prints: røde sløjfer

func IndexAny(s, chars string) int

IndexAny returns the index of the first instance of any Unicode code point from chars in s, or -1 if no Unicode code point from chars is present in s.

fmt.Println(strings.IndexAny("chicken", "aeiouy")) //2 
fmt.Println(strings.IndexAny("crwth", "aeiouy")) //-1

func IndexByte(s string, c byte) int

IndexByte returns the index of the first instance of c in s, or -1 if c is not present in s.

func IndexFunc(s string, f func(rune) bool) int

IndexFunc returns the index into s of the first Unicode code point satisfying f(c), or -1 if none do.

f := func(c rune) bool {
    return unicode.Is(unicode.Han, c)
}
fmt.Println(strings.IndexFunc("Hello, 世界", f)) //7
fmt.Println(strings.IndexFunc("Hello, world", f)) //-1

func IndexRune(s string, r rune) int

indexRune returns the index of the first instance of the Unicode code point r, or -1 if rune is not present in s.

fmt.Println(strings.IndexRune("chicken", 'k')) //4
fmt.Println(strings.IndexRune("chicken", 'd')) //-1

func Join(a []string, sep string) string

Join concatenates the elements of a to create a single string. The separator string sep is placed between elements in the resulting string.

s := []string{"foo", "bar", "baz"}
fmt.Println(strings.Join(s, ", "))
//foo, bar, baz

func LastIndex(s, sep string) int

LastIndex returns the index of the last instance of sep in s, or -1 if sep is not present in s.

fmt.Println(strings.Index("go gopher", "go")) //0
fmt.Println(strings.LastIndex("go gopher", "go")) //3
fmt.Println(strings.LastIndex("go gopher", "rodent")) //-1

func LastIndexAny(s, chars string) int

LastIndexAny returns the index of the last instance of any Unicode code point from chars in s, or -1 if no Unicode code point from chars is present in s.

func LastIndexFunc(s string, f func(rune) bool) int

LastIndexFunc returns the index into s of the last Unicode code point satisfying f(c), or -1 if none do.

line := "rå tørt\u2028vær"
i := strings.IndexFunc(line, unicode.IsSpace) // i == 3
firstWord := line[:i]
j := strings.LastIndexFunc(line, unicode.IsSpace) // j == 9
_, size := utf8.DecodeRuneInString(line[j:]) // size == 3
lastWord := line[j+size:] // j + size == 12
fmt.Println(firstWord, lastWord) // Prints: rå vær

func Map(mapping func(rune) rune, s string) string

Map returns a copy of the string s with all its characters modified according to the mapping function. If mapping returns a negative value, the character is dropped from the string with no replacement.

rot13 := func(r rune) rune {
    switch {
    case r >= 'A' && r <= 'Z':
        return 'A' + (r-'A'+13)%26
    case r >= 'a' && r <= 'z':
        return 'a' + (r-'a'+13)%26
    }
    return r
}
fmt.Println(strings.Map(rot13, "'Twas brillig and the slithy gopher..."))
//'Gjnf oevyyvt naq gur fyvgul tbcure...

func Repeat(s string, count int) string

Repeat returns a new string consisting of count copies of the string s.

fmt.Println("ba" + strings.Repeat("na", 2)) //banana

func Replace(s, old, new string, n int) string

Replace returns a copy of the string s with the first n non-overlapping instances of old replaced by new. If old is empty, it matches at the beginning of the string and after each UTF-8 sequence, yielding up to k+1 replacements for a k-rune string. If n < 0, there is no limit on the number of replacements.

fmt.Println(strings.Replace("oink oink oink", "k", "ky", 2)) //oinky oinky oink
fmt.Println(strings.Replace("oink oink oink", "oink", "moo", -1)) //moo moo moo

func Split(s, sep string) []string

Split slices s into all substrings separated by sep and returns a slice of the substrings between those separators. If sep is empty, Split splits after each UTF-8 sequence. It is equivalent to SplitN with a count of -1.

fmt.Printf("%q\n", strings.Split("a,b,c", ","))
fmt.Printf("%q\n", strings.Split("a man a plan a canal panama", "a "))
fmt.Printf("%q\n", strings.Split(" xyz ", ""))
fmt.Printf("%q\n", strings.Split("", "Bernardo O'Higgins"))
/*
["a" "b" "c"]
["" "man " "plan " "canal panama"]
[" " "x" "y" "z" " "]
[""]
*/

func SplitAfter(s, sep string) []string

SplitAfter slices s into all substrings after each instance of sep and returns a slice of those substrings. If sep is empty, SplitAfter splits after each UTF-8 sequence. It is equivalent to SplitAfterN with a count of -1.

fmt.Printf("%q\n", strings.SplitAfter("a,b,c", ",")) //["a," "b," "c"]

func SplitAfterN(s, sep string, n int) []string

SplitAfterN slices s into substrings after each instance of sep and returns a slice of those substrings. If sep is empty, SplitAfterN splits after each UTF-8 sequence. The count determines the number of substrings to return:

n > 0: at most n substrings; the last substring will be the unsplit remainder. n == 0: the result is nil (zero substrings)
n < 0: all substrings

fmt.Printf("%q\n", strings.SplitAfterN("a,b,c", ",", 2)) //["a," "b,c"]

func SplitN(s, sep string, n int) []string

fmt.Printf("%q\n", strings.SplitN("a,b,c", ",", 2))
z := strings.SplitN("a,b,c", ",", 0)
fmt.Printf("%q (nil = %v)\n", z, z == nil)
/*
["a" "b,c"]
[] (nil = true)
*/

func Title(s string) string

Title returns a copy of the string s with all Unicode letters that begin words mapped to their title case.
BUG: The rule Title uses for word boundaries does not handle Unicode punctuation properly.

fmt.Println(strings.Title("her royal highness")) //Her Royal Highness

func ToLower(s string) string

ToLower returns a copy of the string s with all Unicode letters mapped to their lower case.

fmt.Println(strings.ToLower("Gopher")) //gopher

func ToLowerSpecial(_case unicode.SpecialCase, s string) string

ToLowerSpecial returns a copy of the string s with all Unicode letters mapped to their lower case, giving priority to the special casing rules.

func ToTitle(s string) string

ToTitle returns a copy of the string s with all Unicode letters mapped to their title case.

fmt.Println(strings.ToTitle("loud noises")) //LOUD NOISES
fmt.Println(strings.ToTitle("хлеб")) //ХЛЕБ

func ToTitleSpecial(_case unicode.SpecialCase, s string) string

ToTitleSpecial returns a copy of the string s with all Unicode letters mapped to their title case, giving priority to the special casing rules.

func ToUpper(s string) string

ToUpper returns a copy of the string s with all Unicode letters mapped to their upper case.

fmt.Println(strings.ToUpper("Gopher")) //GOPHER

func ToUpperSpecial(_case unicode.SpecialCase, s string) string

ToUpperSpecial returns a copy of the string s with all Unicode letters mapped to their upper case, giving priority to the special casing rules.

func Trim(s string, cutset string) string

Trim returns a slice of the string s with all leading and trailing Unicode code points contained in cutset removed.

fmt.Printf("[%q]", strings.Trim(" !!! Achtung! Achtung! !!! ", "! ")) 
//["Achtung! Achtung"]

func TrimFunc(s string, f func(rune) bool) string

TrimFunc returns a slice of the string s with all leading and trailing Unicode code points c satisfying f(c) removed.

func TrimLeft(s string, cutset string) string

TrimLeft returns a slice of the string s with all leading Unicode code points contained in cutset removed.

func TrimLeftFunc(s string, f func(rune) bool) string

TrimLeftFunc returns a slice of the string s with all leading Unicode code points c satisfying f(c) removed.

func TrimPrefix(s, prefix string) string

TrimPrefix returns s without the provided leading prefix string. If s doesn't start with prefix, s is returned unchanged.

var s = "Goodbye,, world!"
s = strings.TrimPrefix(s, "Goodbye,")
s = strings.TrimPrefix(s, "Howdy,")
fmt.Print("Hello" + s)//Hello, world!

func TrimRight(s string, cutset string) string

TrimRight returns a slice of the string s, with all trailing Unicode code points contained in cutset removed.

func TrimRightFunc(s string, f func(rune) bool) string

TrimRightFunc returns a slice of the string s with all trailing Unicode code points c satisfying f(c) removed.

func TrimSpace(s string) string

TrimSpace returns a slice of the string s, with all leading and trailing white space removed, as defined by Unicode.

fmt.Println(strings.TrimSpace(" \t\n a lone gopher \n\t\r\n"))
//a lone gopher

func TrimSuffix(s, suffix string) string

TrimSuffix returns s without the provided trailing suffix string. If s doesn't end with suffix, s is returned unchanged.

var s = "Hello, goodbye, etc!"
s = strings.TrimSuffix(s, "goodbye, etc!")
s = strings.TrimSuffix(s, "planet")
fmt.Print(s, "world!") //Hello, world!

type Reader

type Reader struct {
    // contains filtered or unexported fields
}

A Reader implements the io.Reader, io.ReaderAt, io.Seeker, io.WriterTo, io.ByteScanner, and io.RuneScanner interfaces by reading from a string.

func NewReader(s string) *Reader

NewReader returns a new Reader reading from s. It is similar to bytes.NewBufferString but more efficient and read-only.

func (r *Reader) Len() int

Len returns the number of bytes of the unread portion of the string.

func (r *Reader) Read(b []byte) (n int, err error)

func (r *Reader) ReadAt(b []byte, off int64) (n int, err error)

func (r *Reader) ReadByte() (b byte, err error)

func (r *Reader) ReadRune() (ch rune, size int, err error)

func (r *Reader) Seek(offset int64, whence int) (int64, error)

func (r *Reader) Size() int64

Size returns the original length of the underlying string. Size is the number of bytes available for reading via ReadAt. The returned value is always the same and is not affected by calls to any other method.

func (r *Reader) UnreadByte() error

func (r *Reader) UnreadRune() error

func (r *Reader) WriteTo(w io.Writer) (n int64, err error)

type Replacer

type Replacer struct {
    // contains filtered or unexported fields
}

Replacer replaces a list of strings with replacements. It is safe for concurrent use by multiple goroutines.

func NewReplacer(oldnew ...string) *Replacer

NewReplacer returns a new Replacer from a list of old, new string pairs. Replacements are performed in order, without overlapping matches.

r := strings.NewReplacer("<", "&lt;", ">", "&gt;")
fmt.Println(r.Replace("This is <b>HTML</b>!"))
//This is &lt;b&gt;HTML&lt;/b&gt;!

func (r *Replacer) Replace(s string) string

Replace returns a copy of s with all replacements performed.

func (r *Replacer) WriteString(w io.Writer, s string) (n int, err error)

WriteString writes s to w with all replacements performed.