The Strings Package
字符串处理中一个常见的操作是将一个字符串分割成一个不同的子字符串组成的slice. 以便之后的处理——比如将slice中的字符串转换为数组,或者trim 空白符。
为了更好的了解如何使用strings package中的函数,我们重新回顾一个很小的例子,看看如何使用这些函数。 所有的函数请参考3.6和3.7。 让我们从分割一个字符串开始讲解
names := "Niccolò•Noël•Geoffrey•Amélie••Turlough•José"
fmt.Print("|")
for _, name := range strings.Split(names, "•") {
fmt.Printf("%s|", name)
}
fmt.Println()
// |Niccolò|Noël|Geoffrey|Amélie||Turlough|José|
代码的第一句是一个以项目符号分隔的字符串(包含一个空白字段). 我们通过strings.Split()分割这个字符串,这个函数接受一个要分割的字符串和用于分隔符,并且是分隔出所有的字符串(如果需要限定数量,则需要使用到string.SplitN()代替). 如果我们使用了 strings.SplitAfter(). 则输出如下所示
|Niccolò•|Noël•|Geoffrey•|Amélie•|•|Turlough•|José|
strings.SplitAfter()函数跟string.Split()函数一样,但它保留了分隔符,同样 strings.SplitAfterN()限定我们需要分割的次数。
如果我们需要使用多个不同分隔符,我们可以使用strings.FeildFunc()
for _, record := range []string{"László Lajtha*1892*1963",
"Édouard Lalo\t1823\t1892", "José Ángel Lamas|1775|1814"} {
fmt.Println(strings.FieldsFunc(record, func(char rune) bool {
switch char {
case '\t', '*', '|':
return true
}
return false
}))
}
strings.Field函数接受一个字符串(record变量) 和一个func(rune) bool签名的函数引用。 由于这个函数很小,同时也只用于这个地方,所以我们使用了一个匿名函数。 strings.FeildsFunc()函数会遍历record字符串中的每一个字符。 并将这个字符传递给第二个函数。如果第二个函数返回true. 则对字符串进行一次分割。 这里我们看到的分隔符为 tabs, *, |
我们可以使用strings.Replace()函数替换掉字符串中出现的字符。
names = " Antônio\tAndré\tFriedrich\t\t\tJean\t\tÉlisabeth\tIsabella \t"
names = strings.Replace(names, "\t", " ", -1)
fmt.Printf("|%s|\n", names)
// |·Antônio·André··Friedrich···Jean··Élisabeth·Isabella··|
strings.Replace()函数接受一个要替换的字符串, old子串(要被替换的子串), new string, 以及替换的次数(这里是-1, 表示替换全部)。
当我们读取的一个字符串来自于用户的输入或者外部的源文件。 我们需要去除掉一些前后空白。
fmt.Printf("|%s|\n", SimpleSimplifyWhitespace(names))
func SimpleSimplifyWhitespace(s string) string {
return strings.Join(strings.Fields(strings.TrimSpace(s)), " ")
}
//|Antônio·André·Friedrich·Jean·Élisabeth·Isabella|
strings.TrimSpace()函数返回一个复制的字符串,这个字符串被去除掉了前后空白. strings.Feilds()函数用任意数量的空白分隔字符串(字符串中每一项由一个或者多个空白分隔),并返回一个[]string. strings.Join()函数接受一个[]string. 和一个分隔符(可以为空字符串), 并返回组合后的字符串。
当然,我们也可以使用更加高效的方法
func SimplifyWhitespace(s string) string {
var buffer bytes.Buffer
skip := true
for _, char := range s {
if unicode.IsSpace(char) {
if !skip {//前一个字符只能为非空白
buffer.WriteRune(' ')
skip = true
}
} else {
buffer.WriteRune(char)
skip = false
}
}
s = buffer.String()
if skip && len(s) > 0 {//如果句尾是一个空白
s = s[:len(s)-1]
}
return s
}
strings.Map()函数可以用于替换或者删除字符。 它接收两个参数,第一个是用于mapping的函数,函数签名为 func(rune) rune. 第二个参数为一个字符串。 mapping function在每个字符遍历时调用。 每一个字符的替换都是通过mapping functio的返回,如果mapping function返回一个负数,则这个字符被删除.
asciiOnly := func(char rune) rune {
if char > 127 {
return '?'
}
return char
}
fmt.Println(strings.Map(asciiOnly, "Jérôme Österreich"))
// J?r?me·?sterreich
我们也可以直接删除non-ASCII字符,只需要使用 return -1 代替 return '?'.
Jrme·sterreich
我们之前使用for ... range loop来遍历一个字符串。同样也可以通过使用ReadRune()函数实现,只要这个类型实现了ReadRune()函数, 比如bufio.Reader类型
reader := strings.NewReader("Café")
for {
char, size, err := reader.ReadRune()
if err != nil { // might occur if the reader is reading a file
if err == io.EOF { // finished without incident
break
}
panic(err) // a problem occurred
}
fmt.Printf("%U '%c' %d: % X\n", char, char, size, []byte(string(char)))
}
Output
U+0043·'C'·1:·43
U+0061·'a'·1:·61
U+0066·'f'·1:·66
U+00E9·'é'·2:·C3·A9
这段代码读取一个字符串,并且输出每一个字符的字符码,字符,字符有多少个字节,以及表示字符的字节。 大部分情况下,reader是操作于file的。 所以这里我们可以假设reader变量是通过调用bufio.NewReader()创建的。 但是在这里,reader是操作一个字符串,我们通过以下方式进行创建
reader := strings.NewReader("Café")
strings.NewReader 返回一个*strings.Reader指针. 它提供了部分的bufio.Reader中的函数。 如strings.Reader.Read(), strings.Reader.Reader.ReadByte(), strings.Reader.ReadRune(), strings.Reader.UnreadByte(), strings.Reader.UnreadRune()方法。
API
func Contains(s, substr string) bool
Contains returns true if substr is within s.
fmt.Println(strings.Contains("seafood", "foo")) //true
fmt.Println(strings.Contains("seafood", "bar")) //false
fmt.Println(strings.Contains("seafood", "")) //true
fmt.Println(strings.Contains("", "")) //true
func ContainsAny(s, chars string) bool
ContainsAny returns true if any Unicode code points in chars are within s.
fmt.Println(strings.ContainsAny("team", "i")) //false
fmt.Println(strings.ContainsAny("failure", "u & i")) //true
fmt.Println(strings.ContainsAny("foo", "")) //false
fmt.Println(strings.ContainsAny("", "")) //false
func ContainsRune(s string, r rune) bool
ContainsRune returns true if the Unicode code point r is within s.
func Count(s, sep string) int
计算s中sep的个数
fmt.Println(strings.Count("cheese", "e")) //3
fmt.Println(strings.Count("five", "")) //5 before & after each rune
func EqualFold(s, t string) bool
忽略大写小比较s与t是否相等。
fmt.Println(strings.EqualFold("Go", "go")) //true
func Fields(s string) []string
通过一个或者多个连续的空白符(符合unicode.IsSpace定义), 分隔一个字符串. 并返回一个数组。 如果返回的是一个空的数组,则表示s只包含空白符
fmt.Printf("Fields are: %q", strings.Fields(" foo bar baz "))
//Fields are: ["foo" "bar" "baz"]
func FieldsFunc(s string, f func(rune) bool) []string
FieldsFunc会遍历每个字符,并且将字符传入到f函数,如果对这个字符符合要求,则f返回true, 则s在这个位置分割一次。如果把有的代码点都符合f(c)或者string为空,则返回一个空的数组。
f := func(c rune) bool {
return !unicode.IsLetter(c) && !unicode.IsNumber(c)
}
fmt.Printf("Fields are: %q", strings.FieldsFunc(" foo1;bar2,baz3...", f))
//Fields are: ["foo1" "bar2" "baz3"]
func HasPrefix(s, prefix string) bool
HasPrefix tests whether the string s begins with prefix.
func HasPrefix(s, prefix string) bool
HasSuffix tests whether the string s ends with suffix.
//检测输入的文件是以.txt为后缀
strings.HasSuffix(os.Args[1], ".txt")
func Index(s, sep string) int
Index returns the index of the first instance of sep in s, or -1 if sep is not present in s.
fmt.Println(strings.Index("chicken", "ken")) //4
fmt.Println(strings.Index("chicken", "dmr")) //-1
line := "røde og gule sløjfer"
i := strings.Index(line, " ") // Get the index of the first space
firstWord := line[:i] // Slice up to the first space
j := strings.LastIndex(line, " ") // Get the index of the last space
lastWord := line[j+1:] // Slice from after the last space
fmt.Println(firstWord, lastWord) // Prints: røde sløjfer
func IndexAny(s, chars string) int
IndexAny returns the index of the first instance of any Unicode code point from chars in s, or -1 if no Unicode code point from chars is present in s.
fmt.Println(strings.IndexAny("chicken", "aeiouy")) //2
fmt.Println(strings.IndexAny("crwth", "aeiouy")) //-1
func IndexByte(s string, c byte) int
IndexByte returns the index of the first instance of c in s, or -1 if c is not present in s.
func IndexFunc(s string, f func(rune) bool) int
IndexFunc returns the index into s of the first Unicode code point satisfying f(c), or -1 if none do.
f := func(c rune) bool {
return unicode.Is(unicode.Han, c)
}
fmt.Println(strings.IndexFunc("Hello, 世界", f)) //7
fmt.Println(strings.IndexFunc("Hello, world", f)) //-1
func IndexRune(s string, r rune) int
indexRune returns the index of the first instance of the Unicode code point r, or -1 if rune is not present in s.
fmt.Println(strings.IndexRune("chicken", 'k')) //4
fmt.Println(strings.IndexRune("chicken", 'd')) //-1
func Join(a []string, sep string) string
Join concatenates the elements of a to create a single string. The separator string sep is placed between elements in the resulting string.
s := []string{"foo", "bar", "baz"}
fmt.Println(strings.Join(s, ", "))
//foo, bar, baz
func LastIndex(s, sep string) int
LastIndex returns the index of the last instance of sep in s, or -1 if sep is not present in s.
fmt.Println(strings.Index("go gopher", "go")) //0
fmt.Println(strings.LastIndex("go gopher", "go")) //3
fmt.Println(strings.LastIndex("go gopher", "rodent")) //-1
func LastIndexAny(s, chars string) int
LastIndexAny returns the index of the last instance of any Unicode code point from chars in s, or -1 if no Unicode code point from chars is present in s.
func LastIndexFunc(s string, f func(rune) bool) int
LastIndexFunc returns the index into s of the last Unicode code point satisfying f(c), or -1 if none do.
line := "rå tørt\u2028vær"
i := strings.IndexFunc(line, unicode.IsSpace) // i == 3
firstWord := line[:i]
j := strings.LastIndexFunc(line, unicode.IsSpace) // j == 9
_, size := utf8.DecodeRuneInString(line[j:]) // size == 3
lastWord := line[j+size:] // j + size == 12
fmt.Println(firstWord, lastWord) // Prints: rå vær
func Map(mapping func(rune) rune, s string) string
Map returns a copy of the string s with all its characters modified according to the mapping function. If mapping returns a negative value, the character is dropped from the string with no replacement.
rot13 := func(r rune) rune {
switch {
case r >= 'A' && r <= 'Z':
return 'A' + (r-'A'+13)%26
case r >= 'a' && r <= 'z':
return 'a' + (r-'a'+13)%26
}
return r
}
fmt.Println(strings.Map(rot13, "'Twas brillig and the slithy gopher..."))
//'Gjnf oevyyvt naq gur fyvgul tbcure...
func Repeat(s string, count int) string
Repeat returns a new string consisting of count copies of the string s.
fmt.Println("ba" + strings.Repeat("na", 2)) //banana
func Replace(s, old, new string, n int) string
Replace returns a copy of the string s with the first n non-overlapping instances of old replaced by new. If old is empty, it matches at the beginning of the string and after each UTF-8 sequence, yielding up to k+1 replacements for a k-rune string. If n < 0, there is no limit on the number of replacements.
fmt.Println(strings.Replace("oink oink oink", "k", "ky", 2)) //oinky oinky oink
fmt.Println(strings.Replace("oink oink oink", "oink", "moo", -1)) //moo moo moo
func Split(s, sep string) []string
Split slices s into all substrings separated by sep and returns a slice of the substrings between those separators. If sep is empty, Split splits after each UTF-8 sequence. It is equivalent to SplitN with a count of -1.
fmt.Printf("%q\n", strings.Split("a,b,c", ","))
fmt.Printf("%q\n", strings.Split("a man a plan a canal panama", "a "))
fmt.Printf("%q\n", strings.Split(" xyz ", ""))
fmt.Printf("%q\n", strings.Split("", "Bernardo O'Higgins"))
/*
["a" "b" "c"]
["" "man " "plan " "canal panama"]
[" " "x" "y" "z" " "]
[""]
*/
func SplitAfter(s, sep string) []string
SplitAfter slices s into all substrings after each instance of sep and returns a slice of those substrings. If sep is empty, SplitAfter splits after each UTF-8 sequence. It is equivalent to SplitAfterN with a count of -1.
fmt.Printf("%q\n", strings.SplitAfter("a,b,c", ",")) //["a," "b," "c"]
func SplitAfterN(s, sep string, n int) []string
SplitAfterN slices s into substrings after each instance of sep and returns a slice of those substrings. If sep is empty, SplitAfterN splits after each UTF-8 sequence. The count determines the number of substrings to return:
n > 0: at most n substrings; the last substring will be the unsplit remainder. n == 0: the result is nil (zero substrings)
n < 0: all substrings
fmt.Printf("%q\n", strings.SplitAfterN("a,b,c", ",", 2)) //["a," "b,c"]
func SplitN(s, sep string, n int) []string
fmt.Printf("%q\n", strings.SplitN("a,b,c", ",", 2))
z := strings.SplitN("a,b,c", ",", 0)
fmt.Printf("%q (nil = %v)\n", z, z == nil)
/*
["a" "b,c"]
[] (nil = true)
*/
func Title(s string) string
Title returns a copy of the string s with all Unicode letters that begin words mapped to their title case.
BUG: The rule Title uses for word boundaries does not handle Unicode punctuation properly.
fmt.Println(strings.Title("her royal highness")) //Her Royal Highness
func ToLower(s string) string
ToLower returns a copy of the string s with all Unicode letters mapped to their lower case.
fmt.Println(strings.ToLower("Gopher")) //gopher
func ToLowerSpecial(_case unicode.SpecialCase, s string) string
ToLowerSpecial returns a copy of the string s with all Unicode letters mapped to their lower case, giving priority to the special casing rules.
func ToTitle(s string) string
ToTitle returns a copy of the string s with all Unicode letters mapped to their title case.
fmt.Println(strings.ToTitle("loud noises")) //LOUD NOISES
fmt.Println(strings.ToTitle("хлеб")) //ХЛЕБ
func ToTitleSpecial(_case unicode.SpecialCase, s string) string
ToTitleSpecial returns a copy of the string s with all Unicode letters mapped to their title case, giving priority to the special casing rules.
func ToUpper(s string) string
ToUpper returns a copy of the string s with all Unicode letters mapped to their upper case.
fmt.Println(strings.ToUpper("Gopher")) //GOPHER
func ToUpperSpecial(_case unicode.SpecialCase, s string) string
ToUpperSpecial returns a copy of the string s with all Unicode letters mapped to their upper case, giving priority to the special casing rules.
func Trim(s string, cutset string) string
Trim returns a slice of the string s with all leading and trailing Unicode code points contained in cutset removed.
fmt.Printf("[%q]", strings.Trim(" !!! Achtung! Achtung! !!! ", "! "))
//["Achtung! Achtung"]
func TrimFunc(s string, f func(rune) bool) string
TrimFunc returns a slice of the string s with all leading and trailing Unicode code points c satisfying f(c) removed.
func TrimLeft(s string, cutset string) string
TrimLeft returns a slice of the string s with all leading Unicode code points contained in cutset removed.
func TrimLeftFunc(s string, f func(rune) bool) string
TrimLeftFunc returns a slice of the string s with all leading Unicode code points c satisfying f(c) removed.
func TrimPrefix(s, prefix string) string
TrimPrefix returns s without the provided leading prefix string. If s doesn't start with prefix, s is returned unchanged.
var s = "Goodbye,, world!"
s = strings.TrimPrefix(s, "Goodbye,")
s = strings.TrimPrefix(s, "Howdy,")
fmt.Print("Hello" + s)//Hello, world!
func TrimRight(s string, cutset string) string
TrimRight returns a slice of the string s, with all trailing Unicode code points contained in cutset removed.
func TrimRightFunc(s string, f func(rune) bool) string
TrimRightFunc returns a slice of the string s with all trailing Unicode code points c satisfying f(c) removed.
func TrimSpace(s string) string
TrimSpace returns a slice of the string s, with all leading and trailing white space removed, as defined by Unicode.
fmt.Println(strings.TrimSpace(" \t\n a lone gopher \n\t\r\n"))
//a lone gopher
func TrimSuffix(s, suffix string) string
TrimSuffix returns s without the provided trailing suffix string. If s doesn't end with suffix, s is returned unchanged.
var s = "Hello, goodbye, etc!"
s = strings.TrimSuffix(s, "goodbye, etc!")
s = strings.TrimSuffix(s, "planet")
fmt.Print(s, "world!") //Hello, world!
type Reader
type Reader struct {
// contains filtered or unexported fields
}
A Reader implements the io.Reader, io.ReaderAt, io.Seeker, io.WriterTo, io.ByteScanner, and io.RuneScanner interfaces by reading from a string.
func NewReader(s string) *Reader
NewReader returns a new Reader reading from s. It is similar to bytes.NewBufferString but more efficient and read-only.
func (r *Reader) Len() int
Len returns the number of bytes of the unread portion of the string.
func (r *Reader) Read(b []byte) (n int, err error)
func (r *Reader) ReadAt(b []byte, off int64) (n int, err error)
func (r *Reader) ReadByte() (b byte, err error)
func (r *Reader) ReadRune() (ch rune, size int, err error)
func (r *Reader) Seek(offset int64, whence int) (int64, error)
func (r *Reader) Size() int64
Size returns the original length of the underlying string. Size is the number of bytes available for reading via ReadAt. The returned value is always the same and is not affected by calls to any other method.
func (r *Reader) UnreadByte() error
func (r *Reader) UnreadRune() error
func (r *Reader) WriteTo(w io.Writer) (n int64, err error)
type Replacer
type Replacer struct {
// contains filtered or unexported fields
}
Replacer replaces a list of strings with replacements. It is safe for concurrent use by multiple goroutines.
func NewReplacer(oldnew ...string) *Replacer
NewReplacer returns a new Replacer from a list of old, new string pairs. Replacements are performed in order, without overlapping matches.
r := strings.NewReplacer("<", "<", ">", ">")
fmt.Println(r.Replace("This is <b>HTML</b>!"))
//This is <b>HTML</b>!
func (r *Replacer) Replace(s string) string
Replace returns a copy of s with all replacements performed.
func (r *Replacer) WriteString(w io.Writer, s string) (n int, err error)
WriteString writes s to w with all replacements performed.