|
| Utf8Char () |
| Initializes an empty Utf8Char.
|
|
| Utf8Char (const std::string::iterator &basicStringIterator, size_t codeUnitCount) |
| Initializes an instance of Utf8Char using an iterator pointing to some code units and a count of the code units that comprise the character. More...
|
|
| Utf8Char (const std::string::const_iterator &basicStringIterator, size_t codeUnitCount) |
| Initializes an instance of Utf8Char using an const_iterator pointing to some code units and a count of the code units that comprise the character. More...
|
|
| Utf8Char (const char character) |
| Initializes an instance of Utf8Char using an 8-bit ASCII character. More...
|
|
| Utf8Char (const char *character) |
| Initializes an instance of Utf8Char using a variable-code-unit character. More...
|
|
| Utf8Char (const Utf8Char &character) |
| Initializes an instance of Utf8Char using another instance of Utf8Char. More...
|
|
virtual | ~Utf8Char () |
| The class destructor.
|
|
bool | operator== (const Utf8Char &otherCharacter) const |
| Compares the value of this character to the value of another character and tests whether the two character values are the same. More...
|
|
bool | operator!= (const Utf8Char &otherCharacter) const |
| Compares the value of this character to the value of another character and tests whether the two character values are the different. More...
|
|
bool | operator== (const Utf8CharReference &characterReference) const |
| Compares the value of this character to the value of a character reference and tests whether the two character values are the same. More...
|
|
bool | operator!= (const Utf8CharReference &characterReference) const |
| Compares the value of this character to the value of another character reference and tests whether the two character values are different. More...
|
|
Utf8Char & | operator= (const Utf8Char &anotherCharacter) |
| Assigns the contents of another Utf8Char object to this object. More...
|
|
Utf8Char & | operator= (const Utf8CharReference &characterReference) |
| Assigns the contents of a Utf8CharReference object to this object. More...
|
|
char & | operator[] (const size_t index) |
| Returns the code unit found at the specified index. More...
|
|
const char & | operator[] (const size_t index) const |
| Returns the code unit found at the specified index. More...
|
|
| operator Utf16Char () const |
| Converts this object to a Utf16Char object. More...
|
|
const char * | c_str () const |
| Returns c-style version of this character as an array of 8-bit code units. More...
|
|
void | clear () |
| Clears the contents of the character, making it an empty character.
|
|
bool | is_valid () const |
| Indicates whether this character is a valid UTF-8 character. More...
|
|
UInt32 | to_utf_32 () const |
| Converts this character to a UTF-32 code point. More...
|
|
size_t | size () const |
| Returns the number of code units in this character.
|
|
Provides a copy of a UTF-8 character embedded in a UTF-8 string.
Since a Utf8String provides an interface that hides the individual code units, a character is not directly pulled out of the Utf8String, but is constructed from the underlying code units. This class holds a copy of those code units as a single character. Changing the value of a Utf8Char will not cause anything to be changed anywhere else.
Utf8Char and Utf8CharReference objects can be assigned to each other or converted from one to the other.
UtfString::Utf8Char::Utf8Char |
( |
const std::string::iterator & |
basicStringIterator, |
|
|
size_t |
codeUnitCount |
|
) |
| |
Initializes an instance of Utf8Char using an iterator pointing to some code units and a count of the code units that comprise the character.
This constructor assumes that index is less than the maximum number of code units allowed by the UTF-8 encoding. This constructor also assumes that there are at least codeUnitCount code units available in the string from the location of the iterator. If the iterator is pointing to the last one-code-unit character, and a codeUnitCount of 2 is passed in, this constructor will read past the end of the string and a crash will result.
- Parameters
-
[in] | basicStringIterator | An iterator pointing to the code units to be stored in this character |
[in] | codeUnitCount | The number of code units that comprise this character |
UtfString::Utf8Char::Utf8Char |
( |
const std::string::const_iterator & |
basicStringIterator, |
|
|
size_t |
codeUnitCount |
|
) |
| |
Initializes an instance of Utf8Char using an const_iterator pointing to some code units and a count of the code units that comprise the character.
This constructor assumes that index is less than the maximum number of code units allowed by the UTF-8 encoding. This constructor also assumes that there are at least codeUnitCount code units available in the string from the location of the iterator. If the iterator is pointing to the last one-code-unit character, and a codeUnitCount of 2 is passed in, this constructor will read past the end of the string and a crash will result.
- Parameters
-
[in] | basicStringIterator | An iterator pointing to the code units to be stored in this character |
[in] | codeUnitCount | The number of code units that comprise this character |
UtfString::Utf8Char::Utf8Char |
( |
const char |
character) | |
|
Initializes an instance of Utf8Char using an 8-bit ASCII character.
The character passed into this constructor is assumed to be a complete 8-bit ASCII character, and not a code unit that comprises some UTF-8 character. Note that this constructor only handles basic ASCII, the characters that match the first 128 code points in Unicode. Extended ASCII, which varies from platform to platform, will be converted to whatever code point happens to have the same value as the extended ASCII character.
- Parameters
-
[in] | character | The 8-bit ASCII character to use in initializing this character |
UtfString::Utf8Char::Utf8Char |
( |
const char * |
character) | |
|
Initializes an instance of Utf8Char using a variable-code-unit character.
The character passed into this constructor is in the form of a character string, which can contain one or more code units. The character parameter is assumed to be a valid pointer to a null-terminated character string. If the length of character is 0, this character will be initialized as an empty character. If the length of character is 1-4, the code units will be used to initialize this character, whether those code units represent a valid UTF-8 character or not (validity can be checked using the is_valid() function). If the length of character is more than 4, only the first four code units will be used, and the rest will be ignored.
- Parameters
-
[in] | character | A character string containing code units to use in initializing this character |
static Utf8Char UtfString::Utf8Char::GetNextCharacter |
( |
std::istream & |
inputStream) | |
|
|
static |
Gets the next UTF-8 code point from a stream of 8-bit code units.
This functions assumes that inputStream is not at the end of the stream. If only a partial code point is available, this function will return as many code units as it can find. If the first code unit available is not a first code unit, we will have no idea how many code units are in the code point, so only a single code unit will be returned.
- Parameters
-
[in] | inputStream | The input stream from which the next code point will be retrieved |
- Returns
- A UTF-8 character containing the next code point found in the input stream
char& UtfString::Utf8Char::operator[] |
( |
const size_t |
index) | |
|
Returns the code unit found at the specified index.
This operator does not check for the validity of the index, so it assumes that index is less than the maximum number of code units allowed by the UTF-8 encoding.
Note that if the given index isn't within the bounds of the character (whether the index is valid or not), the character will be resized to allow that index to be read and written to.
- Parameters
-
[in] | index | The index identifying the code unit to be retrieved |
- Returns
- The code unit found at the specified index
const char& UtfString::Utf8Char::operator[] |
( |
const size_t |
index) | |
const |
Returns the code unit found at the specified index.
This operator does not check for the validity of the index, so it assumes that index is less than the maximum number of code units allowed by the UTF-8 encoding.
Note that if the given index isn't within the bounds of the character (whether the index is valid or not), the character will be not be resized, as this function can only be called on a constant.
- Parameters
-
[in] | index | The index identifying the code unit to be retrieved |
- Returns
- The code unit found at the specified index