UtfString
Public Member Functions | Static Public Member Functions | Friends | List of all members
UtfString::Utf8Char Class Reference

Provides a copy of a UTF-8 character embedded in a UTF-8 string. More...

#include <Utf8Char.h>

Public Member Functions

 Utf8Char ()
 Initializes an empty Utf8Char.
 
 Utf8Char (const std::string::iterator &basicStringIterator, size_t codeUnitCount)
 Initializes an instance of Utf8Char using an iterator pointing to some code units and a count of the code units that comprise the character. More...
 
 Utf8Char (const std::string::const_iterator &basicStringIterator, size_t codeUnitCount)
 Initializes an instance of Utf8Char using an const_iterator pointing to some code units and a count of the code units that comprise the character. More...
 
 Utf8Char (const char character)
 Initializes an instance of Utf8Char using an 8-bit ASCII character. More...
 
 Utf8Char (const char *character)
 Initializes an instance of Utf8Char using a variable-code-unit character. More...
 
 Utf8Char (const Utf8Char &character)
 Initializes an instance of Utf8Char using another instance of Utf8Char. More...
 
virtual ~Utf8Char ()
 The class destructor.
 
bool operator== (const Utf8Char &otherCharacter) const
 Compares the value of this character to the value of another character and tests whether the two character values are the same. More...
 
bool operator!= (const Utf8Char &otherCharacter) const
 Compares the value of this character to the value of another character and tests whether the two character values are the different. More...
 
bool operator== (const Utf8CharReference &characterReference) const
 Compares the value of this character to the value of a character reference and tests whether the two character values are the same. More...
 
bool operator!= (const Utf8CharReference &characterReference) const
 Compares the value of this character to the value of another character reference and tests whether the two character values are different. More...
 
Utf8Charoperator= (const Utf8Char &anotherCharacter)
 Assigns the contents of another Utf8Char object to this object. More...
 
Utf8Charoperator= (const Utf8CharReference &characterReference)
 Assigns the contents of a Utf8CharReference object to this object. More...
 
char & operator[] (const size_t index)
 Returns the code unit found at the specified index. More...
 
const char & operator[] (const size_t index) const
 Returns the code unit found at the specified index. More...
 
 operator Utf16Char () const
 Converts this object to a Utf16Char object. More...
 
const char * c_str () const
 Returns c-style version of this character as an array of 8-bit code units. More...
 
void clear ()
 Clears the contents of the character, making it an empty character.
 
bool is_valid () const
 Indicates whether this character is a valid UTF-8 character. More...
 
UInt32 to_utf_32 () const
 Converts this character to a UTF-32 code point. More...
 
size_t size () const
 Returns the number of code units in this character.
 

Static Public Member Functions

static Utf8Char GetNextCharacter (std::istream &inputStream)
 Gets the next UTF-8 code point from a stream of 8-bit code units. More...
 

Friends

std::istream & operator>> (std::istream &inputStream, Utf8Char &utf8Char)
 This operator converts a stream of 8-bit values to a UTF-8 character. More...
 
std::ostream & operator<< (std::ostream &outputStream, const Utf8Char &utf8Char)
 This operator converts a UTF-8 character to a stream of 8-bit values. More...
 

Detailed Description

Provides a copy of a UTF-8 character embedded in a UTF-8 string.

Since a Utf8String provides an interface that hides the individual code units, a character is not directly pulled out of the Utf8String, but is constructed from the underlying code units. This class holds a copy of those code units as a single character. Changing the value of a Utf8Char will not cause anything to be changed anywhere else.

Utf8Char and Utf8CharReference objects can be assigned to each other or converted from one to the other.

Constructor & Destructor Documentation

UtfString::Utf8Char::Utf8Char ( const std::string::iterator &  basicStringIterator,
size_t  codeUnitCount 
)

Initializes an instance of Utf8Char using an iterator pointing to some code units and a count of the code units that comprise the character.

This constructor assumes that index is less than the maximum number of code units allowed by the UTF-8 encoding. This constructor also assumes that there are at least codeUnitCount code units available in the string from the location of the iterator. If the iterator is pointing to the last one-code-unit character, and a codeUnitCount of 2 is passed in, this constructor will read past the end of the string and a crash will result.

Parameters
[in]basicStringIteratorAn iterator pointing to the code units to be stored in this character
[in]codeUnitCountThe number of code units that comprise this character
UtfString::Utf8Char::Utf8Char ( const std::string::const_iterator &  basicStringIterator,
size_t  codeUnitCount 
)

Initializes an instance of Utf8Char using an const_iterator pointing to some code units and a count of the code units that comprise the character.

This constructor assumes that index is less than the maximum number of code units allowed by the UTF-8 encoding. This constructor also assumes that there are at least codeUnitCount code units available in the string from the location of the iterator. If the iterator is pointing to the last one-code-unit character, and a codeUnitCount of 2 is passed in, this constructor will read past the end of the string and a crash will result.

Parameters
[in]basicStringIteratorAn iterator pointing to the code units to be stored in this character
[in]codeUnitCountThe number of code units that comprise this character
UtfString::Utf8Char::Utf8Char ( const char  character)

Initializes an instance of Utf8Char using an 8-bit ASCII character.

The character passed into this constructor is assumed to be a complete 8-bit ASCII character, and not a code unit that comprises some UTF-8 character. Note that this constructor only handles basic ASCII, the characters that match the first 128 code points in Unicode. Extended ASCII, which varies from platform to platform, will be converted to whatever code point happens to have the same value as the extended ASCII character.

Parameters
[in]characterThe 8-bit ASCII character to use in initializing this character
UtfString::Utf8Char::Utf8Char ( const char *  character)

Initializes an instance of Utf8Char using a variable-code-unit character.

The character passed into this constructor is in the form of a character string, which can contain one or more code units. The character parameter is assumed to be a valid pointer to a null-terminated character string. If the length of character is 0, this character will be initialized as an empty character. If the length of character is 1-4, the code units will be used to initialize this character, whether those code units represent a valid UTF-8 character or not (validity can be checked using the is_valid() function). If the length of character is more than 4, only the first four code units will be used, and the rest will be ignored.

Parameters
[in]characterA character string containing code units to use in initializing this character
UtfString::Utf8Char::Utf8Char ( const Utf8Char character)

Initializes an instance of Utf8Char using another instance of Utf8Char.

This is a copy constructor, and sets the constructed instance to be the same as the Utf8Char instance passed in as a parameter

Parameters
[in]characterA character to use in initializing this character

Member Function Documentation

const char* UtfString::Utf8Char::c_str ( ) const

Returns c-style version of this character as an array of 8-bit code units.

The c-style array is owned by this object, and the pointer returned by this function is invalidated if any non-const functions are called on this object.

Returns
A pointer to a null-terminated array of 8-bit code units
static Utf8Char UtfString::Utf8Char::GetNextCharacter ( std::istream &  inputStream)
static

Gets the next UTF-8 code point from a stream of 8-bit code units.

This functions assumes that inputStream is not at the end of the stream. If only a partial code point is available, this function will return as many code units as it can find. If the first code unit available is not a first code unit, we will have no idea how many code units are in the code point, so only a single code unit will be returned.

Parameters
[in]inputStreamThe input stream from which the next code point will be retrieved
Returns
A UTF-8 character containing the next code point found in the input stream
bool UtfString::Utf8Char::is_valid ( ) const

Indicates whether this character is a valid UTF-8 character.

This function assumes that size() is from 1 to 4.

Returns
true if the code points in this character represent a valid UTF-8 character, otherwise false
UtfString::Utf8Char::operator Utf16Char ( ) const

Converts this object to a Utf16Char object.

This operator assumes that this character is a valid UTF-8 character.

See Also
Utf8Char::is_valid()
bool UtfString::Utf8Char::operator!= ( const Utf8Char otherCharacter) const

Compares the value of this character to the value of another character and tests whether the two character values are the different.

Parameters
[in]otherCharacterThe character to be compared with this character
Returns
true if the two character values are different, otherwise false
bool UtfString::Utf8Char::operator!= ( const Utf8CharReference characterReference) const

Compares the value of this character to the value of another character reference and tests whether the two character values are different.

Parameters
[in]characterReferenceThe character reference to be compared with this character
Returns
true if the two character values are the different, otherwise false
Utf8Char& UtfString::Utf8Char::operator= ( const Utf8Char anotherCharacter)

Assigns the contents of another Utf8Char object to this object.

Parameters
[in]anotherCharacterThe other Utf8Char object whose contents are to be assigned to this object
Returns
A reference to this object
Utf8Char& UtfString::Utf8Char::operator= ( const Utf8CharReference characterReference)

Assigns the contents of a Utf8CharReference object to this object.

Parameters
[in]characterReferenceThe Utf8CharReference object whose contents are to be assigned to this object
Returns
A reference to this object
bool UtfString::Utf8Char::operator== ( const Utf8Char otherCharacter) const

Compares the value of this character to the value of another character and tests whether the two character values are the same.

Parameters
[in]otherCharacterThe character to be compared with this character
Returns
true if the two character values are the same, otherwise false
bool UtfString::Utf8Char::operator== ( const Utf8CharReference characterReference) const

Compares the value of this character to the value of a character reference and tests whether the two character values are the same.

Parameters
[in]characterReferenceThe character reference to be compared with this character
Returns
true if the two character values are the same, otherwise false
char& UtfString::Utf8Char::operator[] ( const size_t  index)

Returns the code unit found at the specified index.

This operator does not check for the validity of the index, so it assumes that index is less than the maximum number of code units allowed by the UTF-8 encoding.

Note that if the given index isn't within the bounds of the character (whether the index is valid or not), the character will be resized to allow that index to be read and written to.

Parameters
[in]indexThe index identifying the code unit to be retrieved
Returns
The code unit found at the specified index
const char& UtfString::Utf8Char::operator[] ( const size_t  index) const

Returns the code unit found at the specified index.

This operator does not check for the validity of the index, so it assumes that index is less than the maximum number of code units allowed by the UTF-8 encoding.

Note that if the given index isn't within the bounds of the character (whether the index is valid or not), the character will be not be resized, as this function can only be called on a constant.

Parameters
[in]indexThe index identifying the code unit to be retrieved
Returns
The code unit found at the specified index
UInt32 UtfString::Utf8Char::to_utf_32 ( ) const

Converts this character to a UTF-32 code point.

This function assumes that size() is from 1 to 4.

Returns
This character as a UTF-32 code unit

Friends And Related Function Documentation

std::ostream& operator<< ( std::ostream &  outputStream,
const Utf8Char utf8Char 
)
friend

This operator converts a UTF-8 character to a stream of 8-bit values.

No checks for validity are done, so the resulting UTF-8 stream may or may contain a valid UTF-8 character.

Parameters
[in]outputStreamThe output stream to which the contents of the UTF-8 character are to be written
[in]utf8CharThe UTF-8 character to be written to the output stream
std::istream& operator>> ( std::istream &  inputStream,
Utf8Char utf8Char 
)
friend

This operator converts a stream of 8-bit values to a UTF-8 character.

This function clears the contents of utf8Char before the stream is converted. In addition this function assumes that the stream being converted is of the same endianness as the machine on which this function was compiled.

Parameters
[in]inputStreamThe input stream containing 8-bit values to be converted to a UTF-8 string
[in]utf8CharThe character object into which the converted UTF-8 character will be stored

The documentation for this class was generated from the following file: