Strings and Characters
Strings are collections of characters. Strings have the type String
, and characters have the type Character
. Strings can be used to work with text in a Unicode-compliant way. Strings are immutable.
String and character literals are enclosed in double quotation marks ("
).
String literals may contain escape sequences. An escape sequence starts with a backslash (\
):
\0
: Null character\\
: Backslash\t
: Horizontal tab\n
: Line feed\r
: Carriage return\"
: Double quotation mark\'
: Single quotation mark\u
: A Unicode scalar value, written as\u{x}
,where
x
is a 1–8 digit hexadecimal numberwhich needs to be a valid Unicode scalar value,
i.e., in the range 0 to 0xD7FF and 0xE000 to 0x10FFFF inclusive
The type Character
represents a single, human-readable character. Characters are extended grapheme clusters, which consist of one or more Unicode scalars.
For example, the single character ü
can be represented in several ways in Unicode. First, it can be represented by a single Unicode scalar value ü
("LATIN SMALL LETTER U WITH DIAERESIS", code point U+00FC). Second, the same single character can be represented by two Unicode scalar values: u
("LATIN SMALL LETTER U", code point U+0075), and "COMBINING DIAERESIS" (code point U+0308). The combining Unicode scalar value is applied to the scalar before it, which turns a u
into a ü
.
Still, both variants represent the same human-readable character ü
.
Another example where multiple Unicode scalar values are rendered as a single, human-readable character is a flag emoji. These emojis consist of two "REGIONAL INDICATOR SYMBOL LETTER" Unicode scalar values.
Last updated