UtfString
|
Contains and manages a UTF-16 string. More...
#include <Utf16String.h>
Classes | |
class | const_iterator |
An iterator that iterates through the code points in a UTF-16 string, but allowing only access to constant code points. More... | |
class | const_reverse_iterator |
An iterator that iterates through the code points in a UTF-16 string in reverse order, but allowing only access to constant code points. More... | |
class | iterator |
An iterator that iterates through the code points in a UTF-16 string. More... | |
class | reverse_iterator |
An iterator that iterates through the code points in a UTF-16 string in reverse order. More... | |
Public Member Functions | |
Utf16String () | |
The default constructor. | |
Utf16String (const Utf16String &utf16String) | |
Initializes a string with the contents of another string as its initial value. More... | |
Utf16String (const std::basic_string< UInt16 > &utf16String) | |
Initializes a string with the contents of another string as its initial value. More... | |
Utf16String (const UInt16 *utf16String) | |
Initializes a string with the contents of another string as its initial value. More... | |
Utf16String (const UInt16 *utf16String, const size_t codeUnitCount) | |
Initializes a string with the contents of another string as its initial value. More... | |
Utf16String (const wchar_t *wideString) | |
Initializes a string with the contents of another string as its initial value. More... | |
Utf16String (const wchar_t *wideString, const size_t characterCount) | |
Initializes a string with the contents of another string as its initial value. More... | |
Utf16String (const std::wstring &wideString) | |
Initializes a string with the contents of another string as its initial value. More... | |
Utf16String (const Utf8String &utf8String) | |
Initializes a string with the contents of another string as its initial value. More... | |
Utf16String (const std::string utf8String) | |
Initializes a string with the contents of another string as its initial value. More... | |
Utf16String (const char *utf8String) | |
Initializes a string with the contents of another string as its initial value. More... | |
Utf16String (const char *utf8String, const size_t characterCount) | |
Initializes a string with the contents of another string as its initial value. More... | |
virtual | ~Utf16String () |
The class destructor. | |
Utf16String & | append (const Utf16String &utf16String) |
Appends the contents of another string to this string. More... | |
Utf16String & | append (const std::basic_string< UInt16 > &utf16String) |
Appends the contents of another string to this string. More... | |
Utf16String & | append (const UInt16 *utf16String) |
Appends the contents of another string to this string. More... | |
Utf16String & | append (const UInt16 *utf16String, const size_t codeUnitCount) |
Appends the contents of another string to this string. More... | |
Utf16String & | append (const wchar_t *wideString) |
Appends the contents of another string to this string. More... | |
Utf16String & | append (const wchar_t *wideString, const size_t characterCount) |
Appends the contents of another string to this string. More... | |
Utf16String & | append (const std::wstring &wideString) |
Appends the contents of another string to this string. More... | |
Utf16String & | append (const Utf8String &utf8String) |
Appends the contents of another string to this string. More... | |
Utf16String & | append (const std::string &utf8String) |
Appends the contents of another string to this string. More... | |
Utf16String & | append (const char *utf8String) |
Appends the contents of another string to this string. More... | |
Utf16String & | append (const Utf16Char &utf16Character) |
Appends a UTF-16 character to this string. More... | |
Utf16String & | append (const char *utf8String, const size_t characterCount) |
Appends the contents of another string to this string. More... | |
Utf16String & | assign (const Utf16String &utf16String) |
Assigns the contents of another string to this string, replacing the current contents of this string. More... | |
Utf16String & | assign (const std::basic_string< UInt16 > &utf16String) |
Assigns the contents of another string to this string, replacing the current contents of this string. More... | |
Utf16String & | assign (const UInt16 *utf16String) |
Assigns the contents of another string to this string, replacing the current contents of this string. More... | |
Utf16String & | assign (const UInt16 *utf16String, const size_t codeUnitCount) |
Assigns the contents of another string to this string, replacing the current contents of this string. More... | |
Utf16String & | assign (const wchar_t *wideString) |
Assigns the contents of another string to this string, replacing the current contents of this string. More... | |
Utf16String & | assign (const wchar_t *wideString, const size_t characterCount) |
Assigns the contents of another string to this string, replacing the current contents of this string. More... | |
Utf16String & | assign (const std::wstring &wideString) |
Assigns the contents of another string to this string, replacing the current contents of this string. More... | |
Utf16String & | assign (const Utf8String &utf8String) |
Assigns the contents of another string to this string, replacing the current contents of this string. More... | |
Utf16String & | assign (const std::string &utf8String) |
Assigns the contents of another string to this string, replacing the current contents of this string. More... | |
Utf16String & | assign (const char *utf8String) |
Assigns the contents of another string to this string. More... | |
Utf16String & | assign (const char *utf8String, const size_t characterCount) |
Assigns the contents of another string to this string. More... | |
Utf16String & | assign (const Utf16Char &utf16Character) |
Assigns a UTF-16 character to this string. More... | |
Utf16CharReference | at (size_t index) |
Returns a reference to the character found at the specified character index. More... | |
const Utf16Char | at (size_t index) const |
Returns a reference to the character found at the specified character index. More... | |
iterator | begin () |
Returns an iterator pointing to the first character of a string. More... | |
const_iterator | begin () const |
Returns a constant iterator pointing to the first character of a string. More... | |
const UInt16 * | c_str () const |
Returns c-style version of this string as an array of 16-bit code units. More... | |
size_t | capacity () const |
Returns the largest number of code units that can be stored in this string without increasing the memory allocation of this string. More... | |
void | clear () |
Clears out the string, leaving it an empty string. | |
size_t | code_unit_index (const size_t codePointIndex) |
Converts the index of a code point to the index of that code point's first code unit. More... | |
size_t | code_point_index (const size_t codeUnitIndex) |
Converts the index of a code unit to the index of the corresponding code point. More... | |
int | compare (const Utf16String &utf16String) const |
Compares the code points in this string with a code points in another string to determine if both are equal or if one is less than the other. More... | |
int | compare (const std::basic_string< UInt16 > &utf16String) const |
Compares the code points in this string with a code points in another string to determine if both are equal or if one is less than the other. More... | |
int | compare (const UInt16 *utf16String) const |
Compares the code points in this string with a code points in another string to determine if both are equal or if one is less than the other. More... | |
int | compare (const UInt16 *utf16String, const size_t codeUnitCount) const |
Compares the code points in this string with a code points in another string to determine if both are equal or if one is less than the other. More... | |
int | compare (const wchar_t *wideString) const |
Compares the code points in this string with a code points in another string to determine if both are equal or if one is less than the other. More... | |
int | compare (const wchar_t *wideString, const size_t codeUnitCount) const |
Compares the code points in this string with a code points in another string to determine if both are equal or if one is less than the other. More... | |
int | compare (const std::wstring &wideString) const |
Compares the code points in this string with a code points in another string to determine if both are equal or if one is less than the other. More... | |
int | compare (const Utf8String &utf8String) const |
Compares the code points in this string with a code points in another string to determine if both are equal or if one is less than the other. More... | |
int | compare (const std::string &utf8String) const |
Compares the code points in this string with a code points in another string to determine if both are equal or if one is less than the other. More... | |
int | compare (const char *utf8String) const |
Compares the code points in this string with a code points in another string to determine if both are equal or if one is less than the other. More... | |
int | compare (const char *utf8String, const size_t codeUnitCount) const |
Compares the code points in this string with a code points in another string to determine if both are equal or if one is less than the other. More... | |
size_t | copy (UInt16 *codeUnitArray, const size_t codeUnitArraySize, const size_t characterCount, const size_t characterOffset=0) const |
Copies at most a specific number of code points in this string into an array of code units. More... | |
const UInt16 * | data () const |
Returns a pointer an array of 16-bit code units containing the contents of this string. More... | |
bool | empty () const |
Indicates whether this string is empty. More... | |
iterator | end () |
Returns an iterator pointing to the location succeeding the last character in a string. More... | |
const_iterator | end () const |
Returns an constant iterator pointing to the location succeeding the last character in a string. More... | |
Utf16String::iterator | erase (const Utf16String::iterator &firstPosition, const Utf16String::iterator &lastPosition) |
Removes a range of characters from this string. More... | |
Utf16String::iterator | erase (const Utf16String::iterator &position) |
Removes a character from this string. More... | |
Utf16String & | erase (const size_t offset=0, const size_t count=npos) |
Removes a range of characters from this string. More... | |
size_t | find (const Utf16String &searchString, size_t offset=0) |
Searches this string for specific substring. More... | |
size_t | find_first_not_of (const Utf16String &searchString, size_t offset=0) |
Searches this string for the first character that is not found in a given string. More... | |
size_t | find_first_of (const Utf16String &searchString, size_t offset=0) |
Searches this string for the first character that is found in a given string. More... | |
size_t | find_last_not_of (const Utf16String &searchString, size_t offset=npos) |
Searches this string for the last character that is not found in a given string. More... | |
size_t | find_last_of (const Utf16String &searchString, size_t offset=npos) |
Searches this string for the last character that is found in a given string. More... | |
Utf16String & | insert (const size_t index, const Utf16String &utf16String) |
Inserts the contents of another string into this string at a specified index. More... | |
Utf16String & | insert (const size_t index, const std::basic_string< UInt16 > &utf16String) |
Inserts the contents of another string into this string at a specified index. More... | |
Utf16String & | insert (const size_t index, const UInt16 *utf16String) |
Inserts the contents of another string into this string at a specified index. More... | |
Utf16String & | insert (const size_t index, const UInt16 *utf16String, const size_t codeUnitCount) |
Inserts the contents of another string into this string at a specified index. More... | |
Utf16String & | insert (const size_t index, const wchar_t *wideString) |
Inserts the contents of another string into this string at a specified index. More... | |
Utf16String & | insert (const size_t index, const wchar_t *wideString, const size_t codeUnitCount) |
Inserts the contents of another string into this string at a specified index. More... | |
Utf16String & | insert (const size_t index, const std::wstring &wideString) |
Inserts the contents of another string into this string at a specified index. More... | |
Utf16String & | insert (const size_t index, const Utf8String &utf8String) |
Inserts the contents of another string into this string at a specified index. More... | |
Utf16String & | insert (const size_t index, const std::string &utf8String) |
Inserts the contents of another string into this string at a specified index. More... | |
Utf16String & | insert (const size_t index, const char *utf8String) |
Inserts the contents of another string into this string at a specified index. More... | |
Utf16String & | insert (const size_t index, const char *utf8String, const size_t codeUnitCount) |
Inserts the contents of another string into this string at a specified index. More... | |
Utf16String & | insert (const size_t index, const Utf16Char &utf16Character) |
Inserts a character into this string at a specified index. More... | |
bool | is_valid () const |
Indicates whether the code units in this string comprise a valid UTF-16 string. More... | |
size_t | length () const |
Returns the number of code points in this string. More... | |
void | push_back (const Utf16Char &character) |
Appends a character to the end of this string. More... | |
reverse_iterator | rbegin () |
Returns an iterator pointing to the first character of a reversed string, which corresponds to the last character of the normal string. More... | |
const_reverse_iterator | rbegin () const |
Returns a constant iterator pointing to the first character of a reversed string, which corresponds to the last character of a normal string. More... | |
reverse_iterator | rend () |
Returns an iterator pointing to the location succeeding the last character in a reversed string, which corresponds to the location preceding the first character in a normal string. More... | |
const_reverse_iterator | rend () const |
Returns an constant iterator pointing to the location succeeding the last character in a reversed string, which corresponds to the location preceding the first character in a normal string. More... | |
Utf16String & | replace (const size_t position, const size_t count, const Utf16String &replacementString) |
Removes a section of this string and replaces it with the contents of another string. More... | |
Utf16String & | replace (const size_t position, const size_t count, const size_t characterCount, const Utf16Char &character) |
Replaces the characters in a section of this string with the given character. More... | |
Utf16String & | replace (Utf16String::iterator beginIterator, Utf16String::iterator endIterator, const Utf16String &replacementString) |
Removes a section of this string and replaces it with the contents of another string. More... | |
Utf16String & | replace (Utf16String::iterator beginIterator, Utf16String::iterator endIterator, const size_t characterCount, const Utf16Char &character) |
Replaces the characters in a section of this string with the given character. More... | |
size_t | rfind (const Utf16String &searchString, size_t offset=npos) |
Searches this string backward for specific substring. More... | |
size_t | size () const |
Returns the number of code units in this string. More... | |
Utf16String | substr (const size_t offset=0, const size_t count=npos) |
Returns a substring of this string. More... | |
void | swap (Utf16String &utf16String) |
Swaps the contents of this string with those of another string. More... | |
Utf16CharReference | operator[] (const size_t index) |
Returns the character found at the specified character index. More... | |
const Utf16Char | operator[] (const size_t index) const |
Returns the character found at the specified character index. More... | |
bool | operator== (const Utf16String &otherString) const |
Compares the value of this string to the value of another string and tests whether the two strings are the same. More... | |
bool | operator!= (const Utf16String &otherString) const |
Compares the value of this string to the value of another string and tests whether the two strings are the different. More... | |
bool | operator< (const Utf16String &otherString) const |
Compares the value of this string to the value of another string and tests whether the value of this string is less than the value of the other string. More... | |
bool | operator<= (const Utf16String &otherString) const |
Compares the value of this string to the value of another string and tests whether the value of this string is less than or equal to the value of the other string. More... | |
bool | operator> (const Utf16String &otherString) const |
Compares the value of this string to the value of another string and tests whether the value of this string is greater than the value of the other string. More... | |
bool | operator>= (const Utf16String &otherString) const |
Compares the value of this string to the value of another string and tests whether the value of this string is greater than or equal to the value of the other string. More... | |
operator const std::basic_string< UInt16 > () const | |
Converts this object to a basic_string<UInt16> const instance. | |
operator std::basic_string< UInt16 > () | |
Converts this object to a basic_string<UInt16> instance. | |
Static Public Member Functions | |
static size_t | CharacterCodeUnitCount (const std::wstring::const_iterator &stringIterator) |
Counts the number of code units in the character that the string iterator is pointing to. More... | |
static size_t | CharacterCodeUnitCount (Utf16String::const_iterator &stringIterator) |
Counts the number of code units in the character that the string iterator is pointing to. More... | |
static size_t | CharacterCodeUnitCount (const std::basic_string< UInt16 >::const_iterator &stringIterator) |
Counts the number of code units in the character that the string iterator is pointing to. More... | |
static size_t | CharacterCodeUnitCount (const std::basic_string< UInt16 >::const_reverse_iterator &stringIterator) |
Counts the number of code units in the character that the string reverse iterator is pointing to. More... | |
static size_t | CharacterCodeUnitCount (const UInt16 *characterPointer) |
Counts the number of code units in the character that a character pointer is pointing to. More... | |
static UInt32 | DecodeCharacter (const Utf16Char &utf16Character) |
Decodes a UTF-16 character, returning the result as a 32-bit code point. More... | |
static UInt32 | DecodeCharacter (const UInt16 *characterPointer, const size_t codeUnitCount) |
Decodes a UTF-16 character, returning the result as a 32-bit code point. More... | |
static Utf16Char | EncodeCharacter (const UInt32 codePoint) |
Encodes a 32-bit code point as a UTF-16 character. More... | |
static bool | IsValidCharacter (const Utf16Char &utf16Character) |
Indicates whether the code units in a UTF-16 character comprise a valid UTF-16 character. More... | |
static bool | IsValidCharacter (const UInt16 *characterPointer, const size_t codeUnitCount) |
Indicates whether a series of 16-bit code units is a valid UTF-16 character. More... | |
static bool | IsWhitespace (UInt16 utf16Character) |
Indicates whether a UTF-16 character is a whitespace character. More... | |
static bool | IsWhitespace (const Utf16Char &utf16Character) |
Indicates whether a UTF-16 character is a whitespace character. More... | |
Static Public Attributes | |
static const size_t | npos |
An unsigned integral value initialized to –1 that indicates either "not found" or "all remaining characters" when a search function fails. | |
Friends | |
std::istream & | operator>> (std::istream &inputStream, Utf16String &utf16String) |
This operator converts a stream of 16-bit values to a UTF-16 string. More... | |
std::ostream & | operator<< (std::ostream &outputStream, const Utf16String &utf16String) |
This operator converts a UTF-16 string to a stream of 16-bit values. More... | |
std::wistream & | operator>> (std::wistream &inputStream, Utf16String &utf16String) |
This operator converts a wide stream of 16-bit values to a UTF-16 string. More... | |
std::wostream & | operator<< (std::wostream &outputStream, const Utf16String &utf16String) |
This operator converts a UTF-16 string to a wide stream of 16-bit values. More... | |
Contains and manages a UTF-16 string.
This class inherits the STL wstring class. The wstring class can hold UTF-16 strings, but it becomes difficult to conduct any string operations on the UTF-16 string because UTF-16 has variable-length characters. The wstring class is best suited for fixed-length UCS-2 strings, hence why we need a separate class for UTF-16 strings. When accessing individual characters in a Utf16String, wstring objects will be returned. In the vast majority of cases, the wstring will contain a character consisting of a single 16-bit code unit, but in some instances will contain two 16-bit code units.
Using Utf16String is generally advised when dealing with UTF-16 files or when you know that you will be dealing with characters beyond the basic multilingual plane. If you won't be using characters beyond the basic multilingual plane, which is mostly populated by code points for use by Asian languages, then the fixed-width wstring class will probably suffice.
Endianness does not affect the UTF-16 string class. It stores the string as a sequence of 16-bit code units, and C++ abstracts the endianness of a particular code unit. The endianness would become important only if this string was directly converted into a sequence of bytes, where C++ would no longer be able to abstract the byte order.
UtfString::Utf16String::Utf16String | ( | const Utf16String & | utf16String) |
Initializes a string with the contents of another string as its initial value.
[in] | utf16String | A string of 16-bit code units to be the initial value of the string that is being created. |
UtfString::Utf16String::Utf16String | ( | const std::basic_string< UInt16 > & | utf16String) |
Initializes a string with the contents of another string as its initial value.
[in] | utf16String | A string of 16-bit code units to be the initial value of the string that is being created. The string is assumed to be a UTF-16 string. |
UtfString::Utf16String::Utf16String | ( | const UInt16 * | utf16String) |
Initializes a string with the contents of another string as its initial value.
[in] | utf16String | A string of 16-bit code units to be the initial value of the string that is being created. The string is assumed to be a UTF-16 string. |
UtfString::Utf16String::Utf16String | ( | const UInt16 * | utf16String, |
const size_t | codeUnitCount | ||
) |
Initializes a string with the contents of another string as its initial value.
[in] | utf16String | A string of 16-bit code units to be the initial value of the string that is being created. The string is assumed to be a UTF-16 string. |
[in] | codeUnitCount | The number of code units in utf16String to be used in initializing this string. |
UtfString::Utf16String::Utf16String | ( | const wchar_t * | wideString) |
Initializes a string with the contents of another string as its initial value.
[in] | wideString | A string of wchar_t to be the initial value of the string that is being created. The string is assumed to be a UTF-16 string. |
UtfString::Utf16String::Utf16String | ( | const wchar_t * | wideString, |
const size_t | characterCount | ||
) |
Initializes a string with the contents of another string as its initial value.
[in] | wideString | A string of wchar_t to be the initial value of the string that is being created. The string is assumed to be a UTF-16 string. |
[in] | characterCount | The number of characters in wideString to be used in initializing this string. |
UtfString::Utf16String::Utf16String | ( | const std::wstring & | wideString) |
Initializes a string with the contents of another string as its initial value.
[in] | wideString | A string wchar_t to be the initial value of the string that is being created. The string is assumed to be a UTF-16 string. |
UtfString::Utf16String::Utf16String | ( | const Utf8String & | utf8String) |
Initializes a string with the contents of another string as its initial value.
[in] | utf8String | A UTF-8 string to be made the initial value of the string that is being created. The string is assumed to be a valid UTF-8 string. |
UtfString::Utf16String::Utf16String | ( | const std::string | utf8String) |
Initializes a string with the contents of another string as its initial value.
[in] | utf8String | A UTF-8 string to be made the initial value of the string that is being created. The string is assumed to be a valid UTF-8 string. |
UtfString::Utf16String::Utf16String | ( | const char * | utf8String) |
Initializes a string with the contents of another string as its initial value.
[in] | utf8String | A UTF-8 string to be made the initial value of the string that is being created. The string is assumed to be a valid UTF-8 string. |
UtfString::Utf16String::Utf16String | ( | const char * | utf8String, |
const size_t | characterCount | ||
) |
Initializes a string with the contents of another string as its initial value.
[in] | utf8String | A UTF-8 string to be made the initial value of the string that is being created. The string is assumed to be a valid UTF-8 string. |
[in] | characterCount | The number of characters in utfS8String to be used in initializing this string. |
Utf16String& UtfString::Utf16String::append | ( | const Utf16String & | utf16String) |
Appends the contents of another string to this string.
[in] | utf16String | A string of 16-bit code units to be appended |
Utf16String& UtfString::Utf16String::append | ( | const std::basic_string< UInt16 > & | utf16String) |
Appends the contents of another string to this string.
[in] | utf16String | A string of 16-bit code units to be appended. The string is assumed to be a valid UTF-16 string. |
Utf16String& UtfString::Utf16String::append | ( | const UInt16 * | utf16String) |
Appends the contents of another string to this string.
[in] | utf16String | A string of 16-bit code units to be appended. The string is assumed to be a valid UTF-16 string. |
Utf16String& UtfString::Utf16String::append | ( | const UInt16 * | utf16String, |
const size_t | codeUnitCount | ||
) |
Appends the contents of another string to this string.
[in] | utf16String | A string of 16-bit code units to be appended. The string is assumed to be a valid UTF-16 string. |
[in] | codeUnitCount | The number of code units in utf16String to be appended |
Utf16String& UtfString::Utf16String::append | ( | const wchar_t * | wideString) |
Appends the contents of another string to this string.
[in] | wideString | A string of wchar_t to be appended. The string is assumed to be a valid UTF-16 string. |
Utf16String& UtfString::Utf16String::append | ( | const wchar_t * | wideString, |
const size_t | characterCount | ||
) |
Appends the contents of another string to this string.
[in] | wideString | A string of wchar_t to be appended. The string is assumed to be a valid UTF-16 string. |
[in] | characterCount | The number of characters in wideString to be appended |
Utf16String& UtfString::Utf16String::append | ( | const std::wstring & | wideString) |
Appends the contents of another string to this string.
[in] | wideString | A string wchar_t to be appended. The string is assumed to be a valid UTF-16 string. |
Utf16String& UtfString::Utf16String::append | ( | const Utf8String & | utf8String) |
Appends the contents of another string to this string.
[in] | utf8String | A UTF-8 string to be appended. The string is assumed to be a valid UTF-8 string. |
Utf16String& UtfString::Utf16String::append | ( | const std::string & | utf8String) |
Appends the contents of another string to this string.
[in] | utf8String | A UTF-8 string to be appended. The string is assumed to be a valid UTF-8 string. |
Utf16String& UtfString::Utf16String::append | ( | const char * | utf8String) |
Appends the contents of another string to this string.
[in] | utf8String | A UTF-8 string to be appended. The string is assumed to be a valid UTF-8 string. |
Utf16String& UtfString::Utf16String::append | ( | const Utf16Char & | utf16Character) |
Appends a UTF-16 character to this string.
[in] | utf16Character | A UTF-16 character to be appended. The character is assumed to be a valid UTF-16 character |
Utf16String& UtfString::Utf16String::append | ( | const char * | utf8String, |
const size_t | characterCount | ||
) |
Appends the contents of another string to this string.
[in] | utf8String | A UTF-8 string to be appended. The string is assumed to be a valid UTF-8 string. |
[in] | characterCount | The number of characters in utfS8String to be appended |
Utf16String& UtfString::Utf16String::assign | ( | const Utf16String & | utf16String) |
Assigns the contents of another string to this string, replacing the current contents of this string.
[in] | utf16String | A string of 16-bit code units to be assigned |
Utf16String& UtfString::Utf16String::assign | ( | const std::basic_string< UInt16 > & | utf16String) |
Assigns the contents of another string to this string, replacing the current contents of this string.
[in] | utf16String | A string of 16-bit code units to be assigned. The string is assumed to be a valid UTF-16 string. |
Utf16String& UtfString::Utf16String::assign | ( | const UInt16 * | utf16String) |
Assigns the contents of another string to this string, replacing the current contents of this string.
[in] | utf16String | A string of 16-bit code units to be assigned. The string is assumed to be a valid UTF-16 string. |
Utf16String& UtfString::Utf16String::assign | ( | const UInt16 * | utf16String, |
const size_t | codeUnitCount | ||
) |
Assigns the contents of another string to this string, replacing the current contents of this string.
[in] | utf16String | A string of 16-bit code units to be assigned. The string is assumed to be a valid UTF-16 string. |
[in] | codeUnitCount | The number of code units in utf16String to be assigned |
Utf16String& UtfString::Utf16String::assign | ( | const wchar_t * | wideString) |
Assigns the contents of another string to this string, replacing the current contents of this string.
[in] | wideString | A string of wchar_t to be assigned. The string is assumed to be a valid UTF-16 string. |
Utf16String& UtfString::Utf16String::assign | ( | const wchar_t * | wideString, |
const size_t | characterCount | ||
) |
Assigns the contents of another string to this string, replacing the current contents of this string.
[in] | wideString | A string of wchar_t to be assigned. The string is assumed to be a valid UTF-16 string. |
[in] | characterCount | The number of characters in wideString to be assigned |
Utf16String& UtfString::Utf16String::assign | ( | const std::wstring & | wideString) |
Assigns the contents of another string to this string, replacing the current contents of this string.
[in] | wideString | A string of wchar_t to be assigned. The string is assumed to be a valid UTF-16 string. |
Utf16String& UtfString::Utf16String::assign | ( | const Utf8String & | utf8String) |
Assigns the contents of another string to this string, replacing the current contents of this string.
[in] | utf8String | A UTF-8 string to be assigned. The string is assumed to be a valid UTF-8 string. |
Utf16String& UtfString::Utf16String::assign | ( | const std::string & | utf8String) |
Assigns the contents of another string to this string, replacing the current contents of this string.
[in] | utf8String | A UTF-8 string to be assigned. The string is assumed to be a valid UTF-8 string. |
Utf16String& UtfString::Utf16String::assign | ( | const char * | utf8String) |
Assigns the contents of another string to this string.
[in] | utf8String | A UTF-8 string to be assigned. The string is assumed to be a valid UTF-8 string. |
Utf16String& UtfString::Utf16String::assign | ( | const char * | utf8String, |
const size_t | characterCount | ||
) |
Assigns the contents of another string to this string.
[in] | utf8String | A UTF-8 string to be assigned. The string is assumed to be a valid UTF-8 string. |
[in] | characterCount | The number of characters in utfS8String to be assigned |
Utf16String& UtfString::Utf16String::assign | ( | const Utf16Char & | utf16Character) |
Assigns a UTF-16 character to this string.
[in] | utf16Character | A UTF-16 character to be assigned. The character is assumed to be a valid UTF-16 character |
Utf16CharReference UtfString::Utf16String::at | ( | size_t | index) |
Returns a reference to the character found at the specified character index.
This operator does for the validity of the index, and throws an out_of_range exception when the given index doesn't correspond to a character within a string. Note that operator[] is a faster way to access a specific character, but doesn't check for index validity.
Note that unlike standard ASCII or UCS-2, which are fixed-length encodings, UTF-16 is a variable length encoding. This means that whereas accessing a character at a particular index is O(1) for fixed-length encodings, accessing a character in UTF-16 strings is O(1) in the best case and O(n) in the worst case.
So if you wish to iterate through the characters in this string, use the standard iterators instead of an indexer. The standard iterators will be far more efficient.
[in] | index | The index of a character in the string |
const Utf16Char UtfString::Utf16String::at | ( | size_t | index) | const |
Returns a reference to the character found at the specified character index.
This operator does for the validity of the index, and throws an out_of_range exception when the given index doesn't correspond to a character within a string. Note that operator[] is a faster way to access a specific character, but doesn't check for index validity.
Note that unlike standard ASCII or UCS-2, which are fixed-length encodings, UTF-16 is a variable length encoding. This means that whereas accessing a character at a particular index is O(1) for fixed-length encodings, accessing a character in UTF-16 strings is O(1) in the best case and O(n) in the worst case.
So if you wish to iterate through the characters in this string, use the standard iterators instead of an indexer. The standard iterators will be far more efficient.
[in] | index | The index of a character in the string |
iterator UtfString::Utf16String::begin | ( | ) |
Returns an iterator pointing to the first character of a string.
const_iterator UtfString::Utf16String::begin | ( | ) | const |
Returns a constant iterator pointing to the first character of a string.
const UInt16* UtfString::Utf16String::c_str | ( | ) | const |
Returns c-style version of this string as an array of 16-bit code units.
The c-style array is owned by this object, and the pointer returned by this function is invalidated if any non-const functions are called on this object.
size_t UtfString::Utf16String::capacity | ( | ) | const |
Returns the largest number of code units that can be stored in this string without increasing the memory allocation of this string.
When characters are added to the string and there is no more memory available to the string, the string allocates a chunk of memory. Memory is allocated in chunks significantly larger than necessary, so that performance doesn't suffer from lots of memory-allocation operations when many characters are added. The capicity indicates how many code units the string currently has memory for. When size() == capacity(), adding any more characters will cause the string to allocate more memory. Note that in all cases, size() <= capacity().
|
static |
Counts the number of code units in the character that the string iterator is pointing to.
If the iterator is pointing to the second code unit in a surrogate pair, a 0 is returned.
[in] | stringIterator | An iterator pointing to a character on a string |
|
static |
Counts the number of code units in the character that the string iterator is pointing to.
If the iterator is pointing to the second code unit in a surrogate pair, a 0 is returned.
[in] | stringIterator | An iterator pointing to a character on a string |
|
static |
Counts the number of code units in the character that the string iterator is pointing to.
If the iterator is pointing to the second code unit in a surrogate pair, a 0 is returned.
[in] | stringIterator | An iterator pointing to a character on a string |
|
static |
Counts the number of code units in the character that the string reverse iterator is pointing to.
If the reverse iterator is pointing to the first code unit in a surrogate pair, a 0 is returned.
Note that since this function deals with a reverse iterator, it is expecting the code units to be in the reverse order, meaning that the second surrogate code unit is expected to come before the first surrogate code unit. So when passing an iterator pointing to a two-code-unit character, make sure that the iterator is pointing at the very last code unit, and not the first as expected by the overload that accepts a forward iterator.
[in] | stringIterator | A reverse iterator pointing to a character in a string |
|
static |
Counts the number of code units in the character that a character pointer is pointing to.
If the iterator is pointing to the second code unit in a surrogate pair, a 0 is returned.
[in] | characterPointer | A pointer pointing to the first code unit of a UTF-16 code point |
size_t UtfString::Utf16String::code_point_index | ( | const size_t | codeUnitIndex) |
Converts the index of a code unit to the index of the corresponding code point.
The code point index is the index used in this string to identify a particular code point. The code unit index is the index used in the underlying code unit string to identify a particular code unit.
It does not matter whether the code unit index is the first code unit in the code point or not. As long as the string is valid, this function will be able to find the corresponding code point for any code unit.
This function assumes is_valid() is true
[in] | codeUnitIndex | The index of the code unit whose corresponding code point is to be found |
size_t UtfString::Utf16String::code_unit_index | ( | const size_t | codePointIndex) |
Converts the index of a code point to the index of that code point's first code unit.
The code point index is the index used in this string to identify a particular code point. The code unit index is the index used in the underlying code unit string to identify a particular code unit.
This function assumes is_valid() is true
[in] | codePointIndex | The index of the code point whose code units are to be found |
int UtfString::Utf16String::compare | ( | const Utf16String & | utf16String) | const |
Compares the code points in this string with a code points in another string to determine if both are equal or if one is less than the other.
If this string is the same as the parameter string, then the two strings are considered equal. If the strings are different, then one is considered to be less than the other. The strings are compared "alphabetically", and placed in "alphabetical" order. The string that comes before the other string in that order is considered to be less than higher-ordered other string.
Note that "alphabetical" order is used in quotations because it isn't truly alphabetical. Different languages have different symbols and may have complex rules for the ordering of characters. This class does not attempt to address those issues, but instead compares code points based on their Unicode value. So any particular Latin code point will be considered to be less than any particular Cyrillic code point, because the Cyrillic code points have higher Unicode values. Within the English language, the code points are numbered so that they will be compared according to the rules of the language. This may or may not be the case for code points used by other languages.
If language- or locale-specific comparison is necessary, it would be better to use the ICU library.
[in] | utf16String | A string to be compared to this string |
int UtfString::Utf16String::compare | ( | const std::basic_string< UInt16 > & | utf16String) | const |
Compares the code points in this string with a code points in another string to determine if both are equal or if one is less than the other.
If this string is the same as the parameter string, then the two strings are considered equal. If the strings are different, then one is considered to be less than the other. The strings are compared "alphabetically", and placed in "alphabetical" order. The string that comes before the other string in that order is considered to be less than higher-ordered other string.
Note that "alphabetical" order is used in quotations because it isn't truly alphabetical. Different languages have different symbols and may have complex rules for the ordering of characters. This class does not attempt to address those issues, but instead compares code points based on their Unicode value. So any particular Latin code point will be considered to be less than any particular Cyrillic code point, because the Cyrillic code points have higher Unicode values. Within the English language, the code points are numbered so that they will be compared according to the rules of the language. This may or may not be the case for code points used by other languages.
If language- or locale-specific comparison is necessary, it would be better to use the ICU library.
[in] | utf16String | A string to be compared to this string |
int UtfString::Utf16String::compare | ( | const UInt16 * | utf16String) | const |
Compares the code points in this string with a code points in another string to determine if both are equal or if one is less than the other.
If this string is the same as the parameter string, then the two strings are considered equal. If the strings are different, then one is considered to be less than the other. The strings are compared "alphabetically", and placed in "alphabetical" order. The string that comes before the other string in that order is considered to be less than higher-ordered other string.
Note that "alphabetical" order is used in quotations because it isn't truly alphabetical. Different languages have different symbols and may have complex rules for the ordering of characters. This class does not attempt to address those issues, but instead compares code points based on their Unicode value. So any particular Latin code point will be considered to be less than any particular Cyrillic code point, because the Cyrillic code points have higher Unicode values. Within the English language, the code points are numbered so that they will be compared according to the rules of the language. This may or may not be the case for code points used by other languages.
If language- or locale-specific comparison is necessary, it would be better to use the ICU library.
[in] | utf16String | A string to be compared to this string |
int UtfString::Utf16String::compare | ( | const UInt16 * | utf16String, |
const size_t | codeUnitCount | ||
) | const |
Compares the code points in this string with a code points in another string to determine if both are equal or if one is less than the other.
If this string is the same as the parameter string, then the two strings are considered equal. If the strings are different, then one is considered to be less than the other. The strings are compared "alphabetically", and placed in "alphabetical" order. The string that comes before the other string in that order is considered to be less than higher-ordered other string.
Note that "alphabetical" order is used in quotations because it isn't truly alphabetical. Different languages have different symbols and may have complex rules for the ordering of characters. This class does not attempt to address those issues, but instead compares code points based on their Unicode value. So any particular Latin code point will be considered to be less than any particular Cyrillic code point, because the Cyrillic code points have higher Unicode values. Within the English language, the code points are numbered so that they will be compared according to the rules of the language. This may or may not be the case for code points used by other languages.
If language- or locale-specific comparison is necessary, it would be better to use the ICU library.
[in] | utf16String | A string to be compared to this string |
[in] | codeUnitCount | The number of code units in the string to be compared |
int UtfString::Utf16String::compare | ( | const wchar_t * | wideString) | const |
Compares the code points in this string with a code points in another string to determine if both are equal or if one is less than the other.
If this string is the same as the parameter string, then the two strings are considered equal. If the strings are different, then one is considered to be less than the other. The strings are compared "alphabetically", and placed in "alphabetical" order. The string that comes before the other string in that order is considered to be less than higher-ordered other string.
Note that "alphabetical" order is used in quotations because it isn't truly alphabetical. Different languages have different symbols and may have complex rules for the ordering of characters. This class does not attempt to address those issues, but instead compares code points based on their Unicode value. So any particular Latin code point will be considered to be less than any particular Cyrillic code point, because the Cyrillic code points have higher Unicode values. Within the English language, the code points are numbered so that they will be compared according to the rules of the language. This may or may not be the case for code points used by other languages.
If language- or locale-specific comparison is necessary, it would be better to use the ICU library.
[in] | wideString | A string to be compared to this string |
int UtfString::Utf16String::compare | ( | const wchar_t * | wideString, |
const size_t | codeUnitCount | ||
) | const |
Compares the code points in this string with a code points in another string to determine if both are equal or if one is less than the other.
If this string is the same as the parameter string, then the two strings are considered equal. If the strings are different, then one is considered to be less than the other. The strings are compared "alphabetically", and placed in "alphabetical" order. The string that comes before the other string in that order is considered to be less than higher-ordered other string.
Note that "alphabetical" order is used in quotations because it isn't truly alphabetical. Different languages have different symbols and may have complex rules for the ordering of characters. This class does not attempt to address those issues, but instead compares code points based on their Unicode value. So any particular Latin code point will be considered to be less than any particular Cyrillic code point, because the Cyrillic code points have higher Unicode values. Within the English language, the code points are numbered so that they will be compared according to the rules of the language. This may or may not be the case for code points used by other languages.
If language- or locale-specific comparison is necessary, it would be better to use the ICU library.
[in] | wideString | A string to be compared to this string |
[in] | codeUnitCount | The maximum number of code units in the string to be compared |
int UtfString::Utf16String::compare | ( | const std::wstring & | wideString) | const |
Compares the code points in this string with a code points in another string to determine if both are equal or if one is less than the other.
If this string is the same as the parameter string, then the two strings are considered equal. If the strings are different, then one is considered to be less than the other. The strings are compared "alphabetically", and placed in "alphabetical" order. The string that comes before the other string in that order is considered to be less than higher-ordered other string.
Note that "alphabetical" order is used in quotations because it isn't truly alphabetical. Different languages have different symbols and may have complex rules for the ordering of characters. This class does not attempt to address those issues, but instead compares code points based on their Unicode value. So any particular Latin code point will be considered to be less than any particular Cyrillic code point, because the Cyrillic code points have higher Unicode values. Within the English language, the code points are numbered so that they will be compared according to the rules of the language. This may or may not be the case for code points used by other languages.
If language- or locale-specific comparison is necessary, it would be better to use the ICU library.
[in] | wideString | A string to be compared to this string |
int UtfString::Utf16String::compare | ( | const Utf8String & | utf8String) | const |
Compares the code points in this string with a code points in another string to determine if both are equal or if one is less than the other.
If this string is the same as the parameter string, then the two strings are considered equal. If the strings are different, then one is considered to be less than the other. The strings are compared "alphabetically", and placed in "alphabetical" order. The string that comes before the other string in that order is considered to be less than higher-ordered other string.
Note that "alphabetical" order is used in quotations because it isn't truly alphabetical. Different languages have different symbols and may have complex rules for the ordering of characters. This class does not attempt to address those issues, but instead compares code points based on their Unicode value. So any particular Latin code point will be considered to be less than any particular Cyrillic code point, because the Cyrillic code points have higher Unicode values. Within the English language, the code points are numbered so that they will be compared according to the rules of the language. This may or may not be the case for code points used by other languages.
If language- or locale-specific comparison is necessary, it would be better to use the ICU library.
[in] | utf8String | A string to be compared to this string |
int UtfString::Utf16String::compare | ( | const std::string & | utf8String) | const |
Compares the code points in this string with a code points in another string to determine if both are equal or if one is less than the other.
If this string is the same as the parameter string, then the two strings are considered equal. If the strings are different, then one is considered to be less than the other. The strings are compared "alphabetically", and placed in "alphabetical" order. The string that comes before the other string in that order is considered to be less than higher-ordered other string.
Note that "alphabetical" order is used in quotations because it isn't truly alphabetical. Different languages have different symbols and may have complex rules for the ordering of characters. This class does not attempt to address those issues, but instead compares code points based on their Unicode value. So any particular Latin code point will be considered to be less than any particular Cyrillic code point, because the Cyrillic code points have higher Unicode values. Within the English language, the code points are numbered so that they will be compared according to the rules of the language. This may or may not be the case for code points used by other languages.
If language- or locale-specific comparison is necessary, it would be better to use the ICU library.
[in] | utf8String | A string to be compared to this string |
int UtfString::Utf16String::compare | ( | const char * | utf8String) | const |
Compares the code points in this string with a code points in another string to determine if both are equal or if one is less than the other.
If this string is the same as the parameter string, then the two strings are considered equal. If the strings are different, then one is considered to be less than the other. The strings are compared "alphabetically", and placed in "alphabetical" order. The string that comes before the other string in that order is considered to be less than higher-ordered other string.
Note that "alphabetical" order is used in quotations because it isn't truly alphabetical. Different languages have different symbols and may have complex rules for the ordering of characters. This class does not attempt to address those issues, but instead compares code points based on their Unicode value. So any particular Latin code point will be considered to be less than any particular Cyrillic code point, because the Cyrillic code points have higher Unicode values. Within the English language, the code points are numbered so that they will be compared according to the rules of the language. This may or may not be the case for code points used by other languages.
If language- or locale-specific comparison is necessary, it would be better to use the ICU library.
[in] | utf8String | A string to be compared to this string |
int UtfString::Utf16String::compare | ( | const char * | utf8String, |
const size_t | codeUnitCount | ||
) | const |
Compares the code points in this string with a code points in another string to determine if both are equal or if one is less than the other.
If this string is the same as the parameter string, then the two strings are considered equal. If the strings are different, then one is considered to be less than the other. The strings are compared "alphabetically", and placed in "alphabetical" order. The string that comes before the other string in that order is considered to be less than higher-ordered other string.
Note that "alphabetical" order is used in quotations because it isn't truly alphabetical. Different languages have different symbols and may have complex rules for the ordering of characters. This class does not attempt to address those issues, but instead compares code points based on their Unicode value. So any particular Latin code point will be considered to be less than any particular Cyrillic code point, because the Cyrillic code points have higher Unicode values. Within the English language, the code points are numbered so that they will be compared according to the rules of the language. This may or may not be the case for code points used by other languages.
If language- or locale-specific comparison is necessary, it would be better to use the ICU library.
[in] | utf8String | A string to be compared to this string |
[in] | codeUnitCount | The maximum number of code units in the string to be compared |
size_t UtfString::Utf16String::copy | ( | UInt16 * | codeUnitArray, |
const size_t | codeUnitArraySize, | ||
const size_t | characterCount, | ||
const size_t | characterOffset = 0 |
||
) | const |
Copies at most a specific number of code points in this string into an array of code units.
Although the array being copied to is an array of code units, the parameters that indicate which code points are to be copied in this string are the indexes of code points, not of code units. This string is intended to abstract the individual code units.
So if we copy three code points into the code unit array, as few as three or as many as six 16-bit code units will be copied to the code unit array, depending on how many code units comprise the three code points.
If the number of code points to be copied is such so that there isn't enough room in the code unit array for the corresponding code units, the maximum possible number of code points will be copied without going over the boundaries of the array.
Note that this function does not append a null terminator at the end of the array being copied to.
This function assumes that codeUnitArray points to a valid array and that characterOffset < length().
[in] | codeUnitArray | An array of 16-bit code units that will contain the copied code points |
[in] | codeUnitArraySize | The size of the code unit array to be copied to |
[in] | characterCount | The maximum number of code points to be copied to the code unit array |
[in] | characterOffset | The code point offset in this string where the copying to to begin |
const UInt16* UtfString::Utf16String::data | ( | ) | const |
Returns a pointer an array of 16-bit code units containing the contents of this string.
This array is owned by this object, and the pointer returned by this function is invalidated if any non-const functions are called on this object.
This function is almost the same as the c_str() function: the only difference is that the array being returned by this function does not have a null terminator.
|
static |
Decodes a UTF-16 character, returning the result as a 32-bit code point.
This function assumes that utf16Character contains a valid UTF-16 character.
[in] | utf16Character | A series of code units representing a UTF-16 character |
|
static |
Decodes a UTF-16 character, returning the result as a 32-bit code point.
This function assumes that characterPointer points to a buffer containing a valid UTF-16 character. The length of the buffer must be between 1 and 2.
[in] | characterPointer | A pointer to a buffer containing a series of UTF-16 code units representing a UTF-16 character |
[in] | codeUnitCount | The length of the buffer pointed to by characterPointer |
bool UtfString::Utf16String::empty | ( | ) | const |
Indicates whether this string is empty.
If it should be the case that there are code units in the string but no valid code points, the string will be considered non-empty, and false will be returned.
|
static |
Encodes a 32-bit code point as a UTF-16 character.
This function assumes that codePoint falls in the valid Unicode character range of 000000-10FFFF.
[in] | codePoint | A 32-bit code point |
iterator UtfString::Utf16String::end | ( | ) |
Returns an iterator pointing to the location succeeding the last character in a string.
The iterator returned by this function is usually used to test whether an iterator has reached the end of a string. The iterator returned by this function should never be dereferenced, as it doesn't not point to a part of the string.
const_iterator UtfString::Utf16String::end | ( | ) | const |
Returns an constant iterator pointing to the location succeeding the last character in a string.
The iterator returned by this function is usually used to test whether an iterator has reached the end of a string. The iterator returned by this function should never be dereferenced, as it doesn't not point to a part of the string.
Utf16String::iterator UtfString::Utf16String::erase | ( | const Utf16String::iterator & | firstPosition, |
const Utf16String::iterator & | lastPosition | ||
) |
Removes a range of characters from this string.
[in] | firstPosition | An iterator pointing to the first character of the range to be removed |
[in] | lastPosition | An iterator pointing to the position one past the last character of the range to be removed |
Utf16String::iterator UtfString::Utf16String::erase | ( | const Utf16String::iterator & | position) |
Removes a character from this string.
[in] | position | An iterator pointing to the character to be removed |
Utf16String& UtfString::Utf16String::erase | ( | const size_t | offset = 0 , |
const size_t | count = npos |
||
) |
Removes a range of characters from this string.
This function will only cause characters to be removed up to the end of the string, so an overly large count parameter value will not cause problems.
This function assumes that offset <= length().
[in] | offset | The offset describing the index location of the first character to be removed |
[in] | count | The maximum number of characters to be removed |
size_t UtfString::Utf16String::find | ( | const Utf16String & | searchString, |
size_t | offset = 0 |
||
) |
Searches this string for specific substring.
[in] | searchString | The substring to be found in this string |
[in] | offset | The index of the string at which the search is to begin |
size_t UtfString::Utf16String::find_first_not_of | ( | const Utf16String & | searchString, |
size_t | offset = 0 |
||
) |
Searches this string for the first character that is not found in a given string.
Note that if searchString is not a valid UTF-16 string, this function will still work, but the result may turn up an unexpected code point. For example, if the search string contains only the second code unit of a two-code-unit code point, that code point in the string being searched may still be the character identified by the search result, because even though the second code unit was in the search string, the first code unit of that code point was not. This is because there are numerous code points that could have that second code unit, and there is no way to distinguish between them if we are only given one code unit.
[in] | searchString | The string containing the characters that are to be excluded in the search. |
[in] | offset | The index of the string at which the search is to begin |
size_t UtfString::Utf16String::find_first_of | ( | const Utf16String & | searchString, |
size_t | offset = 0 |
||
) |
Searches this string for the first character that is found in a given string.
This function differes from find() in that find() searches for the exact occurrance of the search string whereas this function searches for any one of the characters found in the search string.
This function assumes is_valid() is true and searchString.is_valid() is true.
[in] | searchString | The string containing the characters that are to be searched for |
[in] | offset | The index of the string at which the search is to begin |
size_t UtfString::Utf16String::find_last_not_of | ( | const Utf16String & | searchString, |
size_t | offset = npos |
||
) |
Searches this string for the last character that is not found in a given string.
Note that if searchString is not a valid UTF-16 string, this function will still work, but the result may turn up an unexpected code point. For example, if the search string contains only the second code unit of a two-code-unit code point, that code point in the string being searched may still be the character identified by the search result, because even though the second code unit was in the search string, the first code unit of that code point was not. This is because there are numerous code points that could have that second code unit, and there is no way to distinguish between them if we are only given one code unit.
Please note that the offset in this function controls the index where the search ends, and not where it begins.
[in] | searchString | The string containing the characters that are to be excluded in the search. |
[in] | offset | The index of the string at which the search is to finish |
size_t UtfString::Utf16String::find_last_of | ( | const Utf16String & | searchString, |
size_t | offset = npos |
||
) |
Searches this string for the last character that is found in a given string.
This function differes from find() in that find() searches for the exact occurrance of the search string whereas this function searches for any one of the characters found in the search string.
Please note that the offset in this function controls the index where the search ends, and not where it begins.
This function assumes is_valid() is true and searchString.is_valid() is true.
[in] | searchString | The string containing the characters that are to be searched for |
[in] | offset | The index of the string at which the search is to finish |
Utf16String& UtfString::Utf16String::insert | ( | const size_t | index, |
const Utf16String & | utf16String | ||
) |
Inserts the contents of another string into this string at a specified index.
[in] | index | The index in this string where the parameter string is to be inserted |
[in] | utf16String | A string of 16-bit code units to be appended |
Note that text can be inserted at the end of the string by specifying an index one past the end of the string.
This function assumes index <= length().
Utf16String& UtfString::Utf16String::insert | ( | const size_t | index, |
const std::basic_string< UInt16 > & | utf16String | ||
) |
Inserts the contents of another string into this string at a specified index.
[in] | index | The index in this string where the parameter string is to be inserted |
[in] | utf16String | A string of 16-bit code units to be appended. The string is assumed to be a valid UTF-16 string. |
Note that text can be inserted at the end of the string by specifying an index one past the end of the string.
This function assumes index <= length().
Utf16String& UtfString::Utf16String::insert | ( | const size_t | index, |
const UInt16 * | utf16String | ||
) |
Inserts the contents of another string into this string at a specified index.
[in] | index | The index in this string where the parameter string is to be inserted |
[in] | utf16String | A string of 16-bit code units to be appended. The string is assumed to be a valid UTF-16 string. |
Note that text can be inserted at the end of the string by specifying an index one past the end of the string.
This function assumes index <= length().
Utf16String& UtfString::Utf16String::insert | ( | const size_t | index, |
const UInt16 * | utf16String, | ||
const size_t | codeUnitCount | ||
) |
Inserts the contents of another string into this string at a specified index.
[in] | index | The index in this string where the parameter string is to be inserted |
[in] | utf16String | A string of 16-bit code units to be appended. The string is assumed to be a valid UTF-16 string. |
[in] | codeUnitCount | The number of code units in utf16String to be appended |
Note that text can be inserted at the end of the string by specifying an index one past the end of the string.
This function assumes index <= length().
Utf16String& UtfString::Utf16String::insert | ( | const size_t | index, |
const wchar_t * | wideString | ||
) |
Inserts the contents of another string into this string at a specified index.
[in] | index | The index in this string where the parameter string is to be inserted |
[in] | wideString | A string of wchar_t to be appended. The string is assumed to be a valid UTF-16 string. |
Note that text can be inserted at the end of the string by specifying an index one past the end of the string.
This function assumes index <= length().
Utf16String& UtfString::Utf16String::insert | ( | const size_t | index, |
const wchar_t * | wideString, | ||
const size_t | codeUnitCount | ||
) |
Inserts the contents of another string into this string at a specified index.
[in] | index | The index in this string where the parameter string is to be inserted |
[in] | wideString | A string of wchar_t to be appended. The string is assumed to be a valid UTF-16 string. |
[in] | codeUnitCount | The number of code units in wideString to be appended |
Note that text can be inserted at the end of the string by specifying an index one past the end of the string.
This function assumes index <= length().
Utf16String& UtfString::Utf16String::insert | ( | const size_t | index, |
const std::wstring & | wideString | ||
) |
Inserts the contents of another string into this string at a specified index.
[in] | index | The index in this string where the parameter string is to be inserted |
[in] | wideString | A string wchar_t to be appended. The string is assumed to be a valid UTF-16 string. |
Note that text can be inserted at the end of the string by specifying an index one past the end of the string.
This function assumes index <= length().
Utf16String& UtfString::Utf16String::insert | ( | const size_t | index, |
const Utf8String & | utf8String | ||
) |
Inserts the contents of another string into this string at a specified index.
[in] | index | The index in this string where the parameter string is to be inserted |
[in] | utf8String | A UTF-8 string to be appended. The string is assumed to be a valid UTF-8 string. |
Note that text can be inserted at the end of the string by specifying an index one past the end of the string.
This function assumes index <= length().
Utf16String& UtfString::Utf16String::insert | ( | const size_t | index, |
const std::string & | utf8String | ||
) |
Inserts the contents of another string into this string at a specified index.
[in] | index | The index in this string where the parameter string is to be inserted |
[in] | utf8String | A UTF-8 string to be appended. The string is assumed to be a valid UTF-8 string. |
Note that text can be inserted at the end of the string by specifying an index one past the end of the string.
This function assumes index <= length().
Utf16String& UtfString::Utf16String::insert | ( | const size_t | index, |
const char * | utf8String | ||
) |
Inserts the contents of another string into this string at a specified index.
[in] | index | The index in this string where the parameter string is to be inserted |
[in] | utf8String | A UTF-8 string to be appended. The string is assumed to be a valid UTF-8 string. |
Note that text can be inserted at the end of the string by specifying an index one past the end of the string.
This function assumes index <= length().
Utf16String& UtfString::Utf16String::insert | ( | const size_t | index, |
const char * | utf8String, | ||
const size_t | codeUnitCount | ||
) |
Inserts the contents of another string into this string at a specified index.
[in] | index | The index in this string where the parameter string is to be inserted |
[in] | utf8String | A UTF-8 string to be appended. The string is assumed to be a valid UTF-8 string. |
[in] | codeUnitCount | The number of code units in utf8String to be appended |
Note that text can be inserted at the end of the string by specifying an index one past the end of the string.
This function assumes index <= length().
Utf16String& UtfString::Utf16String::insert | ( | const size_t | index, |
const Utf16Char & | utf16Character | ||
) |
Inserts a character into this string at a specified index.
[in] | index | The index in this string where the character is to be inserted |
[in] | utf16Character | A UTF-16 character to be appended. The character is assumed to be a valid UTF-16 character |
Note that the character can be inserted at the end of the string by specifying an index one past the end of the string.
This function assumes utf16Character is a valid character and that index <= length().
bool UtfString::Utf16String::is_valid | ( | ) | const |
Indicates whether the code units in this string comprise a valid UTF-16 string.
|
static |
Indicates whether the code units in a UTF-16 character comprise a valid UTF-16 character.
A one-code-unit character is valid if it is outside the range D800-DFFF, outside the range FDD0-FDEF, and is not FFFE or FFFF. A two-code-unit character is valid if the first code unit is in the range D800-DBFF, and the second code unit is in the range DC00-DFFF, and the character is not one of the 32 code points that are deemed to be non-characters.
The 32 non-characters code points are 1FFF*, 2FFF*, ..., 10FFF*, where * is an E or an F.
[in] | utf16Character | A Utf16Char object containing the code units to be validated |
|
static |
Indicates whether a series of 16-bit code units is a valid UTF-16 character.
A one-code-unit character is valid if it is outside the range D800-DFFF, outside the range FDD0-FDEF, and is not FFFE or FFFF. A two-code-unit character is valid if the first code unit is in the range D800-DBFF, and the second code unit is in the range DC00-DFFF, and the character is not one of the 32 code points that are deemed to be non-characters..
The 32 non-character code points are 1FFF*, 2FFF*, ..., 10FFF*, where * is an E or an F.
[in] | characterPointer | A pointer to the buffer containing the series of 16-bit code units |
[in] | codeUnitCount | The number of code units in the characterPointer buffer |
|
static |
Indicates whether a UTF-16 character is a whitespace character.
This function tests for the standard ASCII whitespace characters(tab, space, carriage return, line feed), and the characters that the Unicode standard defines as being separator characters.
Note that all whitespace characters are single code-unit characters, and the code units that comprise multi-code-unit UTF-16 characters are within their own range, so a single code unit of a multi-code-unit character will never be thought to be a whitespace character. This makes it safe to pass in code units one at a time, no matter whether they are part of a multi-code-unit pair or not.
[in] | utf16Character | The character to be examined |
|
static |
Indicates whether a UTF-16 character is a whitespace character.
This function tests for the standard ASCII whitespace characters(tab, space, carriage return, line feed), and the characters that the Unicode standard defines as being separator characters.
[in] | utf16Character | The character to be examined |
size_t UtfString::Utf16String::length | ( | ) | const |
Returns the number of code points in this string.
Use this function if you're interested in how many characters are in a string. If you're interested in the number of code units (which may be different if any code points consist of multiple code units), then use the size() function.
This function does not check for validity, so it may return an incorrect result if the code units comprising this string do not form a valid UTF-16 string.
This function has a O(N) performance, since we need to iterate through the code units to figure out how many code points there are. Counting each code point is an extremely quick operation, but due to the need to visit every code point in the string, it would be wise to be mindful of performance when making heavy use of this function on long strings in performance-sensitive code.
bool UtfString::Utf16String::operator!= | ( | const Utf16String & | otherString) | const |
Compares the value of this string to the value of another string and tests whether the two strings are the different.
[in] | otherString | The string to be compared with this string |
bool UtfString::Utf16String::operator< | ( | const Utf16String & | otherString) | const |
Compares the value of this string to the value of another string and tests whether the value of this string is less than the value of the other string.
The values of each string are determined by the Unicode values of the characters. This is the similar comparing strings in alphabetical order, where the character order is determined by the Unicode values and not the ordering of any particular alphabet.
In practice, this works out to be the same as alphabetical ordering for English- language strings, but may not be for strings in other languages.
[in] | otherString | The string to be compared with this string |
bool UtfString::Utf16String::operator<= | ( | const Utf16String & | otherString) | const |
Compares the value of this string to the value of another string and tests whether the value of this string is less than or equal to the value of the other string.
The values of each string are determined by the Unicode values of the characters. This is the similar comparing strings in alphabetical order, where the character order is determined by the Unicode values and not the ordering of any particular alphabet.
In practice, this works out to be the same as alphabetical ordering for English- language strings, but may not be for strings in other languages.
[in] | otherString | The string to be compared with this string |
bool UtfString::Utf16String::operator== | ( | const Utf16String & | otherString) | const |
Compares the value of this string to the value of another string and tests whether the two strings are the same.
[in] | otherString | The string to be compared with this string |
bool UtfString::Utf16String::operator> | ( | const Utf16String & | otherString) | const |
Compares the value of this string to the value of another string and tests whether the value of this string is greater than the value of the other string.
The values of each string are determined by the Unicode values of the characters. This is the similar comparing strings in alphabetical order, where the character order is determined by the Unicode values and not the ordering of any particular alphabet.
In practice, this works out to be the same as alphabetical ordering for English- language strings, but may not be for strings in other languages.
[in] | otherString | The string to be compared with this string |
bool UtfString::Utf16String::operator>= | ( | const Utf16String & | otherString) | const |
Compares the value of this string to the value of another string and tests whether the value of this string is greater than or equal to the value of the other string.
The values of each string are determined by the Unicode values of the characters. This is the similar comparing strings in alphabetical order, where the character order is determined by the Unicode values and not the ordering of any particular alphabet.
In practice, this works out to be the same as alphabetical ordering for English- language strings, but may not be for strings in other languages.
[in] | otherString | The string to be compared with this string |
Utf16CharReference UtfString::Utf16String::operator[] | ( | const size_t | index) |
Returns the character found at the specified character index.
This operator does not check for the validity of the index, so it assumes that index is valid. What happens when the index is invalid is undefined. If you want the index parameter to be validated, use the at() function instead.
Note that unlike standard ASCII or UCS-2, which are fixed-length encodings, UTF-16 is a variable length encoding. This means that whereas accessing a character at a particular index is O(1) for fixed-length encodings, accessing a character in UTF-16 strings is O(1) in the best case and O(n) in the worst case.
So if you wish to iterate through the characters in this string, use the standard iterators instead of an indexer. The standard iterators will be far more efficient.
[in] | index | The index identifying the character to be retrieved |
const Utf16Char UtfString::Utf16String::operator[] | ( | const size_t | index) | const |
Returns the character found at the specified character index.
This operator does not check for the validity of the index, so it assumes that index is valid. What happens when the index is invalid is undefined. If you want the index parameter to be validated, use the at() function instead.
Note that unlike standard ASCII or UCS-2, which are fixed-length encodings, UTF-16 is a variable length encoding. This means that whereas accessing a character at a particular index is O(1) for fixed-length encodings, accessing a character in UTF-16 strings is O(1) in the best case and O(n) in the worst case.
So if you wish to iterate through the characters in this string, use the standard iterators instead of an indexer. The standard iterators will be far more efficient.
[in] | index | The index identifying the character to be retrieved |
void UtfString::Utf16String::push_back | ( | const Utf16Char & | character) |
Appends a character to the end of this string.
This function is the equivalent of calling insert(length(), character) or append(character).
[in] | character | The character to be appended to the end of this string |
reverse_iterator UtfString::Utf16String::rbegin | ( | ) |
Returns an iterator pointing to the first character of a reversed string, which corresponds to the last character of the normal string.
const_reverse_iterator UtfString::Utf16String::rbegin | ( | ) | const |
Returns a constant iterator pointing to the first character of a reversed string, which corresponds to the last character of a normal string.
reverse_iterator UtfString::Utf16String::rend | ( | ) |
Returns an iterator pointing to the location succeeding the last character in a reversed string, which corresponds to the location preceding the first character in a normal string.
The iterator returned by this function is usually used to test whether an iterator has reached the end of a string. The iterator returned by this function should never be dereferenced, as it doesn't not point to a part of the string.
const_reverse_iterator UtfString::Utf16String::rend | ( | ) | const |
Returns an constant iterator pointing to the location succeeding the last character in a reversed string, which corresponds to the location preceding the first character in a normal string.
The iterator returned by this function is usually used to test whether an iterator has reached the end of a string. The iterator returned by this function should never be dereferenced, as it doesn't not point to a part of the string.
Utf16String& UtfString::Utf16String::replace | ( | const size_t | position, |
const size_t | count, | ||
const Utf16String & | replacementString | ||
) |
Removes a section of this string and replaces it with the contents of another string.
Note that if position is one index past the end of the string, replacementString will simply be appended to the end of the string.
This function assumes position <= length().
[in] | position | The index in the string identifying the beginning of the string section to be removed |
[in] | count | The maximum number of characters to be removed from this string |
[in] | replacementString | The string whose contents are to replace the section being removed |
Utf16String& UtfString::Utf16String::replace | ( | const size_t | position, |
const size_t | count, | ||
const size_t | characterCount, | ||
const Utf16Char & | character | ||
) |
Replaces the characters in a section of this string with the given character.
This function assumes position <= length().
[in] | position | The index in the string identifying the first character to be replaced |
[in] | count | The maximum number of characters to be replaced |
[in] | characterCount | The number of times the character is to be repeated in the replaced section |
[in] | character | The character to replace the characters in the identified section of this string |
Utf16String& UtfString::Utf16String::replace | ( | Utf16String::iterator | beginIterator, |
Utf16String::iterator | endIterator, | ||
const Utf16String & | replacementString | ||
) |
Removes a section of this string and replaces it with the contents of another string.
This function replaces the section of the string from beginIterator to endIterator - 1, where endIterator is pointing at a position one past the end of the section to be replaced.
If endIterator points to a position before beginIterator, endIterator is ignored and the entire string from beginIterator to the end of the string is replaced. If beginIterator points to the same position as endIterator, replacementString is simply inserted at that position and nothing in this string is removed.
[in] | beginIterator | An iterator pointing to the first character of the string section to be replaced |
[in] | endIterator | An iterator pointing to the position one past the last character of the string section to be replaced |
[in] | replacementString | The string whose contents are to replace the section being removed |
Utf16String& UtfString::Utf16String::replace | ( | Utf16String::iterator | beginIterator, |
Utf16String::iterator | endIterator, | ||
const size_t | characterCount, | ||
const Utf16Char & | character | ||
) |
Replaces the characters in a section of this string with the given character.
This function replaces the section of the string from beginIterator to endIterator - 1, where endIterator is pointing at a position one past the end of the section to be replaced.
If endIterator points to a position before beginIterator, endIterator is ignored and the entire string from beginIterator to the end of the string is replaced. If beginIterator points to the same position as endIterator, the new characters are simply inserted at that position and nothing in this string is removed.
[in] | beginIterator | An iterator pointing to the first character of the string section to be replaced |
[in] | endIterator | An iterator pointing to the position one past the last character of the string section to be replaced |
[in] | characterCount | The number of times the character is to be repeated in the replaced section |
[in] | character | The character to replace the characters in the identified section of this string |
size_t UtfString::Utf16String::rfind | ( | const Utf16String & | searchString, |
size_t | offset = npos |
||
) |
Searches this string backward for specific substring.
Note this does not look at the characters in reverse order like iterating through a string with a reverse iterator. It looks at the characters in forward order just like the find() function, but starts at the end of the string and works backward toward the beginning.
[in] | searchString | The substring to be found in this string |
[in] | offset | The index of the string at which the search is to begin |
size_t UtfString::Utf16String::size | ( | ) | const |
Returns the number of code units in this string.
If you are interested in how many characters are in this string, use the length() function instead.
Most likely this function has a O(1) performance, since it is a simple count of the number of elements a string. However, this depends on the implementation of the underlying STL basic_string<>::size() function.
Utf16String UtfString::Utf16String::substr | ( | const size_t | offset = 0 , |
const size_t | count = npos |
||
) |
Returns a substring of this string.
The offset parameter indicates which character in the string will become the first character of the substring and the count parameter indicates how many characters will be copied into the substring. If the value of count would cause characters beyond the end of this string to be copied, only characters from the offset to the end of the string will be copied.
This function assumes that offset < length().
[in] | offset | The string offset indicating the first character of the substring |
[in] | count | The number of characters to be copied into the substring |
void UtfString::Utf16String::swap | ( | Utf16String & | utf16String) |
Swaps the contents of this string with those of another string.
[in] | utf16String | The string whose contents are to be swapped with the contents this string |
|
friend |
This operator converts a UTF-16 string to a stream of 16-bit values.
No checks for validity are done, so the resulting UTF-16 stream may or may not contain a valid UTF-16 string.
[in] | outputStream | The output stream to which the contents of the UTF-16 string are to be written |
[in] | utf16String | The UTF-16 string to be written to the output stream |
|
friend |
This operator converts a UTF-16 string to a wide stream of 16-bit values.
No checks for validity are done, so the resulting UTF-16 stream may or may not contain a valid UTF-16 string.
[in] | outputStream | The wide output stream to which the contents of the UTF-16 string are to be written |
[in] | utf16String | The UTF-16 string to be written to the output stream |
|
friend |
This operator converts a stream of 16-bit values to a UTF-16 string.
This function clears the contents of utf16String before the stream is converted. In addition this function assumes that the stream being converted is of the same endianness as the machine on which this function was compiled.
[in] | inputStream | The input stream containing 16-bit values to be converted to a UTF-16 string |
[in] | utf16String | The string object into which the converted UTF-16 string will be stored |
|
friend |
This operator converts a wide stream of 16-bit values to a UTF-16 string.
This function clears the contents of utf16String before the stream is converted. In addition this function assumes that the stream being converted is of the same endianness as the machine on which this function was compiled.
[in] | inputStream | The wide input stream containing 16-bit values to be converted to a UTF-16 string |
[in] | utf16String | The string object into which the converted UTF-16 string will be stored |