Swift Strings and Characters

作者: Harder | 来源:发表于2015-06-19 11:56 被阅读727次

#3 字符串和字符(Strings and Characters
Swift - Strings and Characters
Swift Strings and Characters
Swift Strings and Characters
3、Swift Strings and Characters
Swift学习笔记(二)--字符串,集合类型与流控制
【Swift 3.1】03 - 字符串和字符 (Strings
Strings and Characters
Strings and Characters
The Swift Programming Language--

字符串是字符的有序集合，如 hello, world 和 albatross。Swift中字符串类型使用String表示，而 String 则由一系列的 Character 类型的字符组成

Swift的String 和 Character 类型提供了快速的、兼容Unicode的处理文本的方法。创建、操作字符串的语法非常轻便易读，就像 C 语言使用字符串一样。连结两个字符串非常简单，使用 + 操作符就像做算数技术。字符串是否可以修改，取决与其是常量还说变量，在Swift中所有的值都尊崇这个规则

尽管语法简单，Swift 的字符串仍然是快速而紧跟趋势的。每个字符串都是由不依赖编码的 Unicode 字符组成，并且可以使用多种 Unicode 表示方式存取

字符串插值：在一个字符串中插入常量、变量、文字、表达式等组成一个更大的字符串。这使得生成自定义的用于显示、存储、输出的字符串，变得简单易用

注：Swift的 String 类型可与 Foundation 库的 NSString 类型桥接。只要将 String 转换为 NSString，那么String 类型数据就可以使用 Foundation 库中所有 NSString 相关的API；String 也可以用在任何需要 NSString 做参数的 API 中。
<pre><code>
import Foundation
let stringSwift = "this is a Swift string"
let stringCocoa = stringSwift as NSString
stringCocoa.length
</code></pre>
更多关于Swift 与 Foundation、Cocoa一起使用的信息，请参看Using Swift with Cocoa and Objective-C

String Literals

可以在代码中包含预先定义好的字符串，即：使用双引号扩起来的一串文本字符

使用代码中的字符串生产常量或变量：
<pre><code>
let someString = "Some string literal value"
</code></pre>

上面的代码中，Swift 会自动推断出 someString 是 String 类型的常量，因为其实用字符串进行初始化

注：关于如何在字符串中使用特殊字符，请参考Special Characters in String Literals

初始化空字符串 Initializing an Empty String

创建一个空字符串是使用更复杂字符串的开始，可以使用空串文本或生成一个新的 String 实例来初始化空字符串：
<pre><code>
var emptyString = "" // empty string literal
var anotherEmptyString = String() // initializer syntax
// these two strings are both empty, and are equivalent to each other
</code></pre>

判断字符串是否空字符串，检查 Boolean 属性 isEmpty：
<pre><code>
if emptyString.isEmpty {
print("Nothing to see here")
}
// prints "Nothing to see here"
</code></pre>

字符串可变性 String Mutability

字符串是否可以编辑，取决于字符串是赋值给了变量（可编辑）还是常量（不可编辑）：
<pre><code>
var variableString = "Horse"
variableString += " and carriage"
// variableString is now "Horse and carriage"

let constantString = "Highlander"
constantString += " and another Highlander"
// this reports a compile-time error - a constant string cannot be modified
</code></pre>

注：字符串是否可编辑，Swift 与 Objective-C、Cocoa 是不同的。在Objective-C 中，使用 NSString 表示不可编辑字符串，NSMutableString 表示可编辑字符串

字符串是值类型 Strings Are Value Types

如果将字符串当作参数传递个函数或方法，或赋值给变量或常量，使用的实际是字符串的拷贝。在传参和赋值时，每次都会创建一个新的拷贝而不是使用原字符串。关于值类型的描述请参看Structures and Enumerations Are Value Types

注：这与 Cocoa 中的 NSString 的行为不同。在 Cocoa 中创建 NSString 实例，并将其作为参数传递个函数或方法，或复制给变量，其实传递的只是字符串的引用，它们依然都指向同一个字符串，不会发生拷贝，除非特别指定

Swift 的这种对于字符串的默认拷贝行为，将确保函数或方法返回给你的字符串就是你的，不管字符串从哪里来，都不会被除你以外的其他代码改变

在幕后，Swift 编译器将会优化字符串的使用，只有当真正需要时才进行拷贝。意味着虽然字符串是值类型，你仍然可以获得最好的性能

使用字符 Working with Characters

通过 String 的 characters 属性，使用 for-in 循环可以遍历访问字符串的每个独立的字符：
<pre><code>
for character in "Dog!🐶".characters {
print(character)
}
// D
// o
// g
// !
// 🐶
</code></pre>

另外，也可以创建单独字符 Character 类型的常量或变量，需要单字符字符串，并且声明为 Character 类型：
<pre><code>
let exclamationMark: Character = "!"
</code></pre>

字符串也可以使用字符 Character 数组构造，将字符数组作为参数传给字符串的初始化方法：
<pre><code>
let catCharacters: [Character] = ["C", "a", "t", "!", "🐱"]
let catString = String(catCharacters)
print(catString)
// prints "Cat!🐱"
</code></pre>

连结字符串和字符 Concatenating Strings and Characters

字符串可以使用 + 操作符连结在一起，生成一个新的字符串
<pre><code>
let string1 = "hello"
let string2 = " there"
var welcome = string1 + string2
// welcome now equals "hello there"
</code></pre>

也可以将一个字符串添加到另一个字符串变量后面，使用（+=）操作符：
<pre><code>
var instruction = "look over"
instruction += string2
// instruction now equals "look over there"
</code></pre>

还可以使用 append() 方法，将字符 Character 添加到字符串变量尾部：
<pre><code>
let exclamationMark: Character = "!"
welcome.append(exclamationMark)
// welcome now equals "hello there!"
</code></pre>

注：不能给字符 Character 类型变量添加字符串或字符，因为字符类型只能存放一个字符

字符串插值 String Interpolation

字符串插值，就是把常量、变量，文本，表达式混合在一个字符串里，并用他们的值生成一个新的字符串。其中，要把常量、变量、表达式用圆括号扩起来，并在左括号前加一个反斜线：
<pre><code>
let multiplier = 3
let message = "(multiplier) times 2.5 is (Double(multiplier) * 2.5)"
// message is "3 times 2.5 is 7.5"
</code></pre>

注：圆括号表达式内不能包含非转义的双引号、反斜线，也不能有回车和换行

Unicode

Unicode 是一项为了在不同文字系统间编码、表示、处理文本的国际标准。它几乎能以标准化对形式表示任何语言的任意字符，并能读写由这些字符构成的用于输出的或外来的信息，如文本文件和网页。Swift 的 String 和Character 类型都是 Unicode 兼容的，在本章开头出已经提到过。

标量 Unicode Scalars

后台实现时，Swift 的 String 类型数据使用 Unicode 标量值构成。每个 Unicode 标量都是一个唯一的21-bit数，用于表示一个字符或修饰符，例如 U+0061 表示 LATIN SMALL LETTER A ("a")，或 U+1F425 表示 FRONT-FACING BABY CHICK ("🐥").

注：Unicode 标量的表示范围为 U+0000 到 U+D7FF 和 U+E000 到 U+10FFFF，但是不包括代理对码点（surrogate pair code points）U+D800 到 U+DFFF

并不是所有 21-bit 标量都分配到了字符——一些标量被保留以待将来分配。标量分配到字符，同时也会有个名字，如上例的LATIN SMALL LETTER A 和 FRONT-FACING BABY CHICK

字符串中的特殊字符 Special Characters in String Literals

字符串中可以有一下几种特殊字符：

转义字符 \\0 (空字符)、\\\\ (反斜线)、\\t (水平制表符)、\\n (换行)、\\r (回车), \\" (双引号) and \\' (单引号)
任意标量形式为 \\u{n}，其中 n 为 1-8 位 16 进制数字，表示码点的值（code points）

下面是四种使用特殊字符的例子：
<pre><code>
let wiseWords = ""Imagination is more important than knowledge" - Einstein"
// "Imagination is more important than knowledge" - Einstein
let dollarSign = "\u{24}" // $, Unicode scalar U+0024
let blackHeart = "\u{2665}" // ♥, Unicode scalar U+2665
let sparklingHeart = "\u{1F496}" // 💖, Unicode scalar U+1F496
</code></pre>

扩展字母簇 Extended Grapheme Clusters

每个Swift 的 Character 实例都是一个扩展字母簇。扩展字母簇是由一个或多个 Unicode标量合而为一的可读字符

例如字母 é，可以用单Unicode标量 é (LATIN SMALL LETTER E WITH ACUTE, or U+00E9) 表示，也可用两个标量表示——标准字母 e(LATIN SMALL LETTER E, or U+0065)，紧跟标量 COMBINING ACUTE ACCENT (U+0301) 。标量 COMBINING ACUTE ACCENT 是用于修饰其前一个字符的图像，在识别 Unicode 的文本渲染系统中，就可以将 e 显示为 é

以上两种情况，é 都是单个 Swift 字符，表示一个扩展字母簇。第一种情况，簇里只有一个标量；第二种情况则有两个标量：
<pre><code>
let eAcute: Character = "\u{E9}" // é
let combinedEAcute: Character = "\u{65}\u{301}" // e followed by ́
// eAcute is é, combinedEAcute is é
</code></pre>

扩展字母簇提供了一个灵活的方式，将多个复杂的脚本字符表示为一个 Character，例如韩文字母한：
<pre><code>
let precomposed: Character = "\u{D55C}" // 한
let decomposed: Character = "\u{1112}\u{1161}\u{11AB}" // ᄒ, ᅡ, ᆫ
// precomposed is 한, decomposed is 한
</code></pre>

扩展字母簇可以在封闭标记标量（如 COMBINING ENCLOSING CIRCLE, or U+20DD）中放入另一个 Unicode标量，就像一个字母一样：

<pre><code>
let enclosedEAcute: Character = "\u{E9}\u{20DD}"
let enclosedEAcute2: Character = "\u{65}\u{301}\u{20DD}"
// enclosedEAcute also enclosedEAcute2 is é⃝
</code></pre>

Unicode区域指示符标量可以两个组合成一个Swift字符，例如组合REGIONAL INDICATOR SYMBOL LETTER U (U+1F1FA) 和 REGIONAL INDICATOR SYMBOL LETTER S (U+1F1F8)：
<pre><code>
let regionalIndicatorForUS: Character = "\u{1F1FA}\u{1F1F8}"
// regionalIndicatorForUS is 🇺🇸（这里如果不是美国国旗图标，可能是因为浏览器不支持）
</code></pre>

统计字符数 Counting Characters

使用 String 的 characters 属性的 count 属性，可以得到字符串的 Character 字符数：
<pre><code>
let unusualMenagerie = "Koala 🐨, Snail 🐌, Penguin 🐧, Dromedary 🐪"
print("unusualMenagerie has \(unusualMenagerie.characters.count) characters")
// prints "unusualMenagerie has 40 characters"
</code></pre>

注意：如果Swift在字符串中使用扩展字母簇，在连结和修改字符串时也不太可能影响字符串的字符数，如下：
<pre><code>
var word = "cafe"
print("the number of characters in \(word) is \(word.characters.count)")
// prints "the number of characters in cafe is 4"

word += "\u{301}" // COMBINING ACUTE ACCENT, U+0301

print("the number of characters in \(word) is \(word.characters.count)")
// prints "the number of characters in café is 4"
</code></pre>

注：扩展字母簇可能有一个或多个 Unicode标量组成，这意味着不同的字符——和同一个字符的不同表现形式——可能需要占用的内存也不相同。因此，在Swift中即使是同一个字符串，每个字符占用的内存也可能不同。这就导致，为了计算字符串的字符数，就不得不遍历整个字符串，以确定扩展字母簇的边界。如果你正在使用一个特别长的字符串，那么在使用 characters 属性时就要注意，只有遍历了整个字符串的 Unicode标量后才能得到字符串所有字符
<br />
从字符串的characters属性得到的字符数可能与 NSString 的 length 属性返回的结果不同。NSString的长度是基于字符串的UTF-16表示法计算出16-bit 码数的字符数，而不是字符串的扩展字母簇的数量

使用和修改字符串 Accessing and Modifying a String

使用和修改字符串，一般使用其方法和属性，或使用其下班语法

字符串索引 String Indexes

每个字符串都有一个相关联等类型 String.Index，用于对应每个字符在字符串中的位置

如前所述，不同的字符可能会占用不同的内存数，所以为了确定字符在哪里开始，必须从头到尾变量整个字符串的 Unicode标量。因此，Swift 字符串不能使用正数作为索引

获得字符串第一个字符的位置使用 startIndex 属性。endIndex 属性的位置是最后一个字符的下一个位置“past-the-end”，如果字符串为空，则 startIndex 与 endIndex 相等

使用 String.Index，调用 predecessor() 方法可获得紧挨的前一个字符的索引，调用 successor() 方法可获得紧挨的后一个字符的索引。在一个字符串中，通过链接调用这两个定位方法，就能从任一索引访问到任意其他索引；或者使用全局函数 advance(start:n:)。如果要访问的索引超出的字符串的范围，将会触发运行时错误

可以使用下标语法访问字符串指定索引位置的字符。如果要访问字符的索引超出了字符串的范围，将会触发运行时错误
<pre><code>
let greeting = "Guten Tag"
greeting[greeting.startIndex]
// G
greeting[greeting.endIndex.predecessor()]
// g
greeting[greeting.startIndex.successor()]
// u
let index = advance(greeting.startIndex, 7)
greeting[index]
// a
greeting[greeting.endIndex] // error
greeting.endIndex.successor() // error
</code></pre>

使用全局函数 indices(_:) 能创建一个包含字符串每个字符的索引的Range
<pre><code>
for index in indices(greeting) {
print("\(greeting[index]) ")
}
print("\n")
// prints "G u t e n T a g"
</code></pre>

注：在Swift 2 pre－release中，该函数提示access the 'indices' property on the collection，所以上例改为
<pre><code>
for index in greeting.characters.indices {
print("\(greeting[index])")
}
print("\n")
</code></pre>

插入和删除 Inserting and Removing

使用方法 insert(_:atIndex:) 将一个字符插入到字符串的指定位置：
<pre><code>
var welcome = "hello"
welcome.insert("!", atIndex: welcome.endIndex)
// welcome now equals "hello!"
</code></pre>

使用方法 splice(_:atIndex:) 将另一个字符串插入到字符串的指定位置：
<pre><code>
welcome.splice(" there".characters, atIndex: welcome.endIndex.predecessor())
// welcome now equals "hello there!"
</code></pre>

使用方法 removeAtIndex(_:) 将字符串指定位置的字符删除
<pre><code>
welcome.removeAtIndex(welcome.endIndex.predecessor())
// welcome now equals "hello there"
</code></pre>

使用方法 removeRange(_:) 将字符串指定范围的子串删除
<pre><code>
let range = advance(welcome.endIndex, -6)..< welcome.endIndex
welcome.removeRange(range) //这里小于号与welcome要链接在一起，但因为与html冲突问题未解决，所以加了空格
// welcome now equals "hello"
</code></pre>

字符串比较 Comparing Strings

Swift提供了3种方式比较文本值：字符串与字符相等；前缀相等；后缀相等

字符串与字符相等 String and Character Equality

字符串与字符相等使用操作符 == 和 != 进行检查
<pre><code>
let quotation = "We're a lot alike, you and I."
let sameQuotation = "We're a lot alike, you and I."
if quotation == sameQuotation {
print("These two strings are considered equal")
}
// prints "These two strings are considered equal"
</code></pre>

比较两个字符串（或两个字符），如果他们的扩展字母簇的规范相等(canonically equivalent) 则认为相等。扩展字母簇规范相等：有相同的语义和显示，甚至在内存中，由不同的Unicode标量组成

例如，LATIN SMALL LETTER E WITH ACUTE (U+00E9) 与 LATIN SMALL LETTER E (U+0065) 加 COMBINING ACUTE ACCENT (U+0301) 规范相等。这两种扩展字母簇都表示字符 é，所以认为他们规范相等：
<pre><code>
// "Voulez-vous un café?" using LATIN SMALL LETTER E WITH ACUTE
let eAcuteQuestion = "Voulez-vous un caf\u{E9}?"

// "Voulez-vous un café?" using LATIN SMALL LETTER E and COMBINING ACUTE ACCENT
let combinedEAcuteQuestion = "Voulez-vous un caf\u{65}\u{301}?"

if eAcuteQuestion == combinedEAcuteQuestion {
print("These two strings are considered equal")
}
// prints "These two strings are considered equal"
</code></pre>

相反，在英语中使用的字符 LATIN CAPITAL LETTER A (U+0041, or "A") 与在俄语中使用的字符 CYRILLIC CAPITAL LETTER A ** (U+0410, or "А")，他们看起来一样，但是语义**(linguistic meaning)并不相同：
<pre><code>
let latinCapitalLetterA: Character = "\u{41}"
let cyrillicCapitalLetterA: Character = "\u{0410}"
if latinCapitalLetterA != cyrillicCapitalLetterA {
print("These two characters are not equivalent")
}
// prints "These two characters are not equivalent"
</code></pre>

注：在Swift中，字符串和字符比较是非地区敏感的(not locale-sensitive)

前缀与后缀相等 Prefix and Suffix Equality

要检查字符串是否有特定的前缀或后缀，则调用字符串方法 hasPrefix(_:) 或 hasSuffix(_:)，每个方法都接受一个字符串参数，并返回 Boolean 值

下面是一个表示场景位置的字符串数字，来自莎士比亚歌剧《罗密欧与朱丽叶》的前两幕：
<pre><code>
let romeoAndJuliet = [
"Act 1 Scene 1: Verona, A public place",
"Act 1 Scene 2: Capulet's mansion",
"Act 1 Scene 3: A room in Capulet's mansion",
"Act 1 Scene 4: A street outside Capulet's mansion",
"Act 1 Scene 5: The Great Hall in Capulet's mansion",
"Act 2 Scene 1: Outside Capulet's mansion",
"Act 2 Scene 2: Capulet's orchard",
"Act 2 Scene 3: Outside Friar Lawrence's cell",
"Act 2 Scene 4: A street in Verona",
"Act 2 Scene 5: Capulet's mansion",
"Act 2 Scene 6: Friar Lawrence's cell"
]
</code></pre>

使用 hasPrefix(_:) 能得出演出的第一幕有多少个场景：
<pre><code>
var act1SceneCount = 0
for scene in romeoAndJuliet {
if scene.hasPrefix("Act 1 ") {
++act1SceneCount
}
}
print("There are \(act1SceneCount) scenes in Act 1")
// prints "There are 5 scenes in Act 1"
</code></pre>

类似的，使用 hasSuffix(_:) 能得出在Capulet's mansion与Friar Lawrence's cell两处有多少个场景：
<pre><code>
var mansionCount = 0
var cellCount = 0
for scene in romeoAndJuliet {
if scene.hasSuffix("Capulet's mansion") {
++mansionCount
} else if scene.hasSuffix("Friar Lawrence's cell") {
++cellCount
}
}
print("\(mansionCount) mansion scenes; \(cellCount) cell scenes")
// prints "6 mansion scenes; 2 cell scenes"
</code></pre>

注：hasPrefix(_:) 与 hasSuffix(_:) 都使用规范相等(canonical equivalence )，逐个字符的比较字符串中的每个扩展字母簇。参考字符串与字符相等

字符串的Unicode表示 Unicode Representations of Strings

当把Unicode字符串存储到文本文件或其他地方时，字符串中的Unicode标量将使用Unicode的几种编码方式之一进行编码。每种方式都将字符串编码为小块，即码元（code units）。

UTF-8 使用8-bit的码元编码字符串
UTF-16 使用16-bit的码元编码字符串
UTF-32 使用32-bit的码元编码字符串

Swift 提供了几种不同的方法存访问字符串的Unicode表示数据。可以使用 for-in语句遍历字符串，以访问每个单独的扩展字母簇字符

或者，访问以下三中Unicode兼容表示中的字符串

UTF-8 码元集合（使用字符串属性 utf8访问）
UTF-16 码元集合（使用字符串属性 utf16访问）
21-bit Unicode标量集合，等价于字符串的UTF-32编码形式（使用字符串属性 unicodeScalars访问）

以下各例中使用的字符串的不同表示，都是由字符 D，o，g，‼(DOUBLE EXCLAMATION MARK, or Unicode scalar U+203C), 和字符 🐶 (DOG FACE, or Unicode scalar U+1F436) 构成：
<pre><code>
let dogString = "Dog‼🐶"
</code></pre>

UTF-8 表示（Representation）

遍历字符串的属性utf8，就可以访问字符串的UTF-8表示。该属性的类型是 String.UTF8View，是一个无符号8-bit（UInt8）的集合。UTF-8表示的字符串的每个字节如下：

字符串`dogString`的UTF-8表示

<pre><code>
for codeUnit in dogString.utf8 {
print("(codeUnit) ", appendNewline: false)
}
print("")
// 68 111 103 226 128 188 240 159 144 182
</code></pre>

上例中，前三个码元的十进制值（68 111 103）表示字符D, o, g，这与他们的ASCII表示是相同的。接下来的三个码元数值（226 128 188），是字符DOUBLE EXCLAMATION MARK的UTF-8表示，使用三个字节。最后四个码元数值（240 159 144 182），是字符DOG FACE的UTF-8表示，使用四个字节

UTF-16 表示（Representation）

遍历字符串的属性utf16，就可以访问字符串的UTF-16表示。该属性的类型是 String.UTF16View，是一个无符号16-bit（UInt16）的集合。UTF-16表示的字符串的每个16-bit码元如下：

字符串`dogString`的UTF-16表示

<pre><code>
for codeUnit in dogString.utf16 {
print("(codeUnit) ", appendNewline: false)
}
print("")
// 68 111 103 8252 55357 56374
</code></pre>

同样，前三个码元值（68 111 103）表示字符D, o, g，这看起来与UTF-8表示的码元值是一样的（因为这些Unicode标量都表示ASCII字符）。

第四个码元值（8252）与十六进制数 203C 的值相等，而 203C 表示字符DOUBLE EXCLAMATION MARK 的Unicode标量 U+203C。这个字符在UTF-16中使用一个码元即可表示

第五到第六个码元值（55357 56374）是一个UTF-16代理对，表示字符DOG FACE。这两个值分别是高位代理值U+D83D（55357）与低位代理值U+DC36（56374）

Unicode Scalar 表示（Representation）

遍历字符串的属性unicodeScalars，就可以访问字符串的Unicode标量表示。该属性的类型是 UnicodeScalarView，一个UnicodeScalar的集合。每个UnicodeScalar的属性value是一个UInt32值，存放标量的21-bit值：

字符串`dogString`的UnicodeScalar表示

<pre><code>
for scalar in dogString.unicodeScalars {
print("(scalar.value) ", appendNewline: false)
}
print("")
// 68 111 103 8252 128054
</code></pre>

前三个UnicodeScalar的value属性的值（68 111 103）同样表示字符D, o, g

第四个码元值（8252）同样也与十六进制数 203C 的值相等，而 203C 表示字符DOUBLE EXCLAMATION MARK 的Unicode标量 U+203C

第五个也是最后一个UnicodeScalar的value值(128054)，与十六进制数 1F436 的值相等，表示字符DOG FACE的Unicode标量 U+1F436

在不使用属性value的情况下，可以使用UnicodeScalar构造一个新的字符串，例如用在字符串插值：
<pre><code>
for scalar in dogString.unicodeScalars {
print(String(scalar), appendNewline: false)
print("(scalar) ")
}
// D -- D
// o -- o
// g -- g
// ‼ -- ‼
// 🐶 -- 🐶
</code></pre>

参考：The Swift Programming Language

#3 字符串和字符(Strings and Characters
Strings and Characters swift 4中的字符串 - swiftGG swift相对于JS，...
Swift - Strings and Characters
多行字符串使用""换行，增加代码可读性，并不会加入到 String Value 中，即实际字符串并不会换行。与...
Swift Strings and Characters
字符串是字符的有序集合，如 hello, world 和 albatross。Swift中字符串类型使用Strin...
Swift Strings and Characters
Initializing an Empty String To create an empty Stringval...
3、Swift Strings and Characters
字符串String是由一系列的字符Characters组成的字符串字面量字符串中的特殊字符初始化空字符串字...
Swift学习笔记(二)--字符串,集合类型与流控制
字符串和字符(Strings and Characters) 在新版Swift中, 对String进行了本质性的修...
【Swift 3.1】03 - 字符串和字符 (Strings
字符串和字符 (Strings and Characters) 自从苹果2014年发布Swift，到现在已经两年多...
Strings and Characters
使用“ """ ”创建多行字符串使用超过下方引号前空格数的空格缩进使用string.isEmpty判断字符串是...
Strings and Characters
String Literals Multiline String Literals a single line S...
The Swift Programming Language--
Strings and Characters The entire NSString API is availab...