c++-gtk-utils
|
A class which will iterate through a std::string object by reference to unicode characters rather than by bytes. More...
#include <c++-gtk-utils/convert.h>
Public Types | |
typedef gunichar | value_type |
typedef gunichar | reference |
typedef void | pointer |
typedef std::string::difference_type | difference_type |
typedef std::bidirectional_iterator_tag | iterator_category |
Public Member Functions | |
Iterator & | operator++ () |
Iterator | operator++ (int) |
Iterator & | operator-- () |
Iterator | operator-- (int) |
Iterator & | operator= (const std::string::const_iterator &iter) |
Iterator & | operator= (const std::string::iterator &iter) |
Iterator & | operator= (const Iterator &iter) |
Iterator & | operator= (const ReverseIterator &iter) |
Iterator::value_type | operator* () const |
std::string::const_iterator | base () const |
Iterator (const std::string::const_iterator &iter) | |
Iterator (const std::string::iterator &iter) | |
Iterator (const Iterator &iter) | |
Iterator (const ReverseIterator &iter) | |
Iterator () | |
A class which will iterate through a std::string object by reference to unicode characters rather than by bytes.
The Cgu::Utf8::Iterator class does the same as std::string::const_iterator, except that when iterating through a std::string object using the ++ and - - postfix and prefix operators, it iterates by increments of whole unicode code points rather than by reference to bytes. In addition, the dereferencing operator returns the whole unicode code point (a UCS-4 gunichar type) rather than a char type.
Where, as in practically all unix-like systems, sizeof(wchar_t) == 4, then the gunichar return value of the dereferencing operator can be converted by a simple static_cast to the wchar_t type. So far as displaying individual code points is concerned however, it should be noted that because unicode allows combining characters, a unicode code point may not contain the whole representation of a character as displayed. This effect can be dealt with for all characters capable of representation by Level 1 unicode (ie by precomposed characters) using g_utf8_normalize() before iterating. There will still however be some non-European scripts, in particular some Chinese/Japanese/Korean ideograms, where description of the ideogram requires more than one code point to be finally resolved. For these, printing individual code points sequentially one by one directly to a display (say with std::wcout) may or not may not have the desired result, depending on how the display device (eg console) deals with that case.
A Cgu::Utf8::Iterator only allows reading from and not writing to the std::string object being iterated through. This is because in UTF-8 the representation of any one unicode code point will require between 1 and 6 bytes: accordingly modifying a UTF-8 string may change its length (in bytes) even though the number of unicode characters stays the same. For the same reason, this iterator is a bidirectional iterator but not a random access iterator.
The std::string object concerned should contain valid UTF-8 text. If necessary, this should be checked with Cgu::Utf8::validate() first. In addition, before use, the Cgu::Utf8::Iterator object must be initialized by a std::string::const_iterator or std::string::iterator object pointing to the first byte of a valid UTF-8 character in the string (or by another Cgu::Utf8::Iterator object or by a Cgu::Utf8::ReverseIterator object), and iteration will begin at the point of initialization: therefore, assuming the string contains valid UTF-8 text, passing std::string::begin() to a Cgu::Utf8::Iterator object will always be safe. Initialization by std::string::end() is also valid if the first iteration is backwards with the -- operator. This initialization can be done either in the constructor or by assignment. Comparison operators ==, !=, <, <=, > and >= are provided enabling the position of Cgu::Utf8::Iterator objects to be compared with each other or with std::string::const_iterator and std::string::iterator objects.
This is an example:
This class assumes in using g_utf8_next_char(), g_utf8_prev_char() and g_utf8_get_char() that the std::string object keeps its internal string in contiguous storage. This is required by the C++11/14 standard, but not formally by C++98/C++03. However, known implementations of std::string in fact store the string contiguously.
typedef std::string::difference_type Cgu::Utf8::Iterator::difference_type |
typedef std::bidirectional_iterator_tag Cgu::Utf8::Iterator::iterator_category |
typedef void Cgu::Utf8::Iterator::pointer |
typedef gunichar Cgu::Utf8::Iterator::reference |
typedef gunichar Cgu::Utf8::Iterator::value_type |
|
inline |
Constructs this iterator and initialises it with a std::string::const_iterator object. It should point to the beginning of a UTF-8 character (eg std::string::begin()) or to std::string::end(). It will not throw provided that copy constructing a std::string::const_iterator object does not throw, as it will not in any sane implementation. This is a type conversion constructor (it is not marked explicit) so that it can be used with Cgu::Utf8::Iterator comparison operators to compare the position of Cgu::Utf8::Iterator with std::string::const_iterator objects.
iter | The std::string::const_iterator. |
|
inline |
Constructs this iterator and initialises it with a std::string::iterator object. It should point to the beginning of a UTF-8 character (eg std::string::begin()) or to std::string::end(). It will not throw provided that copy constructing a std::string::const_iterator object does not throw, as it will not in any sane implementation. This is a type conversion constructor (it is not marked explicit) so that it can be used with Cgu::Utf8::Iterator comparison operators to compare the position of Cgu::Utf8::Iterator with std::string::iterator objects.
iter | The std::string::iterator. |
|
inline |
Constructs this iterator and initialises it with another Cgu::Utf8::Iterator object. It will not throw provided that copy constructing a std::string::const_iterator object does not throw, as it will not in any sane implementation.
iter | The iterator. |
|
inlineexplicit |
Constructs this iterator and initialises it with a Cgu::Utf8::ReverseIterator object, so that this iterator adopts the same physical position (but the logical position will be offset to the following UTF-8 character). It will not throw provided that copy constructing a std::string::const_iterator object does not throw, as it will not in any sane implementation.
iter | The iterator. |
|
inline |
The default constructor will not throw.
|
inline |
|
inline |
The dereference operator.
|
inline |
Increments the iterator so that it moves from the beginning of the current UTF-8 character to the beginning of the next UTF-8 character. It is a prefix operator. It will not throw.
|
inline |
Increments the iterator so that it moves from the beginning of the current UTF-8 character to the beginning of the next UTF-8 character. It is a postfix operator. It will not throw provided that copy constructing and assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.
|
inline |
Decrements the iterator so that it moves from the beginning of the current UTF-8 character to the beginning of the previous UTF-8 character. It is a prefix operator. It will not throw.
|
inline |
Decrements the iterator so that it moves from the beginning of the current UTF-8 character to the beginning of the previous UTF-8 character. It is a postfix operator. It will not throw provided that copy constructing and assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.
Assigns a Cgu::Utf8::Iterator object to this object. It will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.
iter | The iterator. |
|
inline |
Assigns a Cgu::Utf8::ReverseIterator object to this object, so that this iterator adopts the same physical position (but the logical position will be offset to the following UTF-8 character). It will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.
iter | The iterator. |
|
inline |
Assigns a std::string::const_iterator object to this object. It should point to the beginning of a UTF-8 character (eg std::string::begin()) or to std::string::end(). It will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.
iter | The std::string::const_iterator. |
|
inline |
Assigns a std::string::iterator object to this object. It should point to the beginning of a UTF-8 character (eg std::string::begin()) or to std::string::end(). It will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.
iter | The std::string::iterator. |