c++-gtk-utils
Public Types | Public Member Functions | List of all members
Cgu::Utf8::Iterator Class Reference

A class which will iterate through a std::string object by reference to unicode characters rather than by bytes. More...

#include <c++-gtk-utils/convert.h>

Public Types

typedef gunichar value_type
 
typedef gunichar reference
 
typedef void pointer
 
typedef std::string::difference_type difference_type
 
typedef std::bidirectional_iterator_tag iterator_category
 

Public Member Functions

Iteratoroperator++ ()
 
Iterator operator++ (int)
 
Iteratoroperator-- ()
 
Iterator operator-- (int)
 
Iteratoroperator= (const std::string::const_iterator &iter)
 
Iteratoroperator= (const std::string::iterator &iter)
 
Iteratoroperator= (const Iterator &iter)
 
Iteratoroperator= (const ReverseIterator &iter)
 
Iterator::value_type operator* () const
 
std::string::const_iterator base () const
 
 Iterator (const std::string::const_iterator &iter)
 
 Iterator (const std::string::iterator &iter)
 
 Iterator (const Iterator &iter)
 
 Iterator (const ReverseIterator &iter)
 
 Iterator ()
 

Detailed Description

A class which will iterate through a std::string object by reference to unicode characters rather than by bytes.

See also
Cgu::Utf8::ReverseIterator

The Cgu::Utf8::Iterator class does the same as std::string::const_iterator, except that when iterating through a std::string object using the ++ and - - postfix and prefix operators, it iterates by increments of whole unicode code points rather than by reference to bytes. In addition, the dereferencing operator returns the whole unicode code point (a UCS-4 gunichar type) rather than a char type.

Where, as in practically all unix-like systems, sizeof(wchar_t) == 4, then the gunichar return value of the dereferencing operator can be converted by a simple static_cast to the wchar_t type. So far as displaying individual code points is concerned however, it should be noted that because unicode allows combining characters, a unicode code point may not contain the whole representation of a character as displayed. This effect can be dealt with for all characters capable of representation by Level 1 unicode (ie by precomposed characters) using g_utf8_normalize() before iterating. There will still however be some non-European scripts, in particular some Chinese/Japanese/Korean ideograms, where description of the ideogram requires more than one code point to be finally resolved. For these, printing individual code points sequentially one by one directly to a display (say with std::wcout) may or not may not have the desired result, depending on how the display device (eg console) deals with that case.

A Cgu::Utf8::Iterator only allows reading from and not writing to the std::string object being iterated through. This is because in UTF-8 the representation of any one unicode code point will require between 1 and 6 bytes: accordingly modifying a UTF-8 string may change its length (in bytes) even though the number of unicode characters stays the same. For the same reason, this iterator is a bidirectional iterator but not a random access iterator.

The std::string object concerned should contain valid UTF-8 text. If necessary, this should be checked with Cgu::Utf8::validate() first. In addition, before use, the Cgu::Utf8::Iterator object must be initialized by a std::string::const_iterator or std::string::iterator object pointing to the first byte of a valid UTF-8 character in the string (or by another Cgu::Utf8::Iterator object or by a Cgu::Utf8::ReverseIterator object), and iteration will begin at the point of initialization: therefore, assuming the string contains valid UTF-8 text, passing std::string::begin() to a Cgu::Utf8::Iterator object will always be safe. Initialization by std::string::end() is also valid if the first iteration is backwards with the -- operator. This initialization can be done either in the constructor or by assignment. Comparison operators ==, !=, <, <=, > and >= are provided enabling the position of Cgu::Utf8::Iterator objects to be compared with each other or with std::string::const_iterator and std::string::iterator objects.

This is an example:

using namespace Cgu;
std::wstring wide_str(L"ßøǿón");
std::string narrow_str(Utf8::uniwide_to_utf8(wide_str));
for (iter = narrow_str.begin();
iter != narrow_str.end();
++iter)
std::wcout << static_cast<wchar_t>(*iter) << std::endl;

This class assumes in using g_utf8_next_char(), g_utf8_prev_char() and g_utf8_get_char() that the std::string object keeps its internal string in contiguous storage. This is required by the C++11/14 standard, but not formally by C++98/C++03. However, known implementations of std::string in fact store the string contiguously.

Member Typedef Documentation

◆ difference_type

typedef std::string::difference_type Cgu::Utf8::Iterator::difference_type

◆ iterator_category

typedef std::bidirectional_iterator_tag Cgu::Utf8::Iterator::iterator_category

◆ pointer

◆ reference

◆ value_type

Constructor & Destructor Documentation

◆ Iterator() [1/5]

Cgu::Utf8::Iterator::Iterator ( const std::string::const_iterator &  iter)
inline

Constructs this iterator and initialises it with a std::string::const_iterator object. It should point to the beginning of a UTF-8 character (eg std::string::begin()) or to std::string::end(). It will not throw provided that copy constructing a std::string::const_iterator object does not throw, as it will not in any sane implementation. This is a type conversion constructor (it is not marked explicit) so that it can be used with Cgu::Utf8::Iterator comparison operators to compare the position of Cgu::Utf8::Iterator with std::string::const_iterator objects.

Parameters
iterThe std::string::const_iterator.

◆ Iterator() [2/5]

Cgu::Utf8::Iterator::Iterator ( const std::string::iterator &  iter)
inline

Constructs this iterator and initialises it with a std::string::iterator object. It should point to the beginning of a UTF-8 character (eg std::string::begin()) or to std::string::end(). It will not throw provided that copy constructing a std::string::const_iterator object does not throw, as it will not in any sane implementation. This is a type conversion constructor (it is not marked explicit) so that it can be used with Cgu::Utf8::Iterator comparison operators to compare the position of Cgu::Utf8::Iterator with std::string::iterator objects.

Parameters
iterThe std::string::iterator.

◆ Iterator() [3/5]

Cgu::Utf8::Iterator::Iterator ( const Iterator iter)
inline

Constructs this iterator and initialises it with another Cgu::Utf8::Iterator object. It will not throw provided that copy constructing a std::string::const_iterator object does not throw, as it will not in any sane implementation.

Parameters
iterThe iterator.

◆ Iterator() [4/5]

Cgu::Utf8::Iterator::Iterator ( const ReverseIterator iter)
inlineexplicit

Constructs this iterator and initialises it with a Cgu::Utf8::ReverseIterator object, so that this iterator adopts the same physical position (but the logical position will be offset to the following UTF-8 character). It will not throw provided that copy constructing a std::string::const_iterator object does not throw, as it will not in any sane implementation.

Parameters
iterThe iterator.

◆ Iterator() [5/5]

Cgu::Utf8::Iterator::Iterator ( )
inline

The default constructor will not throw.

Member Function Documentation

◆ base()

std::string::const_iterator Cgu::Utf8::Iterator::base ( ) const
inline
Returns
The current underlying std::string::const_iterator kept by this iterator. Once this iterator has been correctly initialized, that will point to the beginning of the UTF-8 character currently represented by this iterator or to std::string::end(). It will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.

◆ operator*()

Iterator::value_type Cgu::Utf8::Iterator::operator* ( ) const
inline

The dereference operator.

Returns
A 32-bit gunichar object containing the whole unicode code point which is currently represented by this iterator. It will not throw.

◆ operator++() [1/2]

Iterator & Cgu::Utf8::Iterator::operator++ ( )
inline

Increments the iterator so that it moves from the beginning of the current UTF-8 character to the beginning of the next UTF-8 character. It is a prefix operator. It will not throw.

Returns
A reference to the iterator in its new position.

◆ operator++() [2/2]

Iterator Cgu::Utf8::Iterator::operator++ ( int  )
inline

Increments the iterator so that it moves from the beginning of the current UTF-8 character to the beginning of the next UTF-8 character. It is a postfix operator. It will not throw provided that copy constructing and assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.

Returns
A copy of the iterator in its former position.

◆ operator--() [1/2]

Iterator & Cgu::Utf8::Iterator::operator-- ( )
inline

Decrements the iterator so that it moves from the beginning of the current UTF-8 character to the beginning of the previous UTF-8 character. It is a prefix operator. It will not throw.

Returns
A reference to the iterator in its new position.

◆ operator--() [2/2]

Iterator Cgu::Utf8::Iterator::operator-- ( int  )
inline

Decrements the iterator so that it moves from the beginning of the current UTF-8 character to the beginning of the previous UTF-8 character. It is a postfix operator. It will not throw provided that copy constructing and assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.

Returns
A copy of the iterator in its former position.

◆ operator=() [1/4]

Iterator& Cgu::Utf8::Iterator::operator= ( const Iterator iter)
inline

Assigns a Cgu::Utf8::Iterator object to this object. It will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.

Parameters
iterThe iterator.
Returns
A reference to this Cgu::Utf8::Iterator object after assignment.

◆ operator=() [2/4]

Iterator & Cgu::Utf8::Iterator::operator= ( const ReverseIterator iter)
inline

Assigns a Cgu::Utf8::ReverseIterator object to this object, so that this iterator adopts the same physical position (but the logical position will be offset to the following UTF-8 character). It will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.

Parameters
iterThe iterator.
Returns
A reference to this Cgu::Utf8::Iterator object after assignment.

◆ operator=() [3/4]

Iterator& Cgu::Utf8::Iterator::operator= ( const std::string::const_iterator &  iter)
inline

Assigns a std::string::const_iterator object to this object. It should point to the beginning of a UTF-8 character (eg std::string::begin()) or to std::string::end(). It will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.

Parameters
iterThe std::string::const_iterator.
Returns
A reference to this Cgu::Utf8::Iterator object after assignment.

◆ operator=() [4/4]

Iterator& Cgu::Utf8::Iterator::operator= ( const std::string::iterator &  iter)
inline

Assigns a std::string::iterator object to this object. It should point to the beginning of a UTF-8 character (eg std::string::begin()) or to std::string::end(). It will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.

Parameters
iterThe std::string::iterator.
Returns
A reference to this Cgu::Utf8::Iterator object after assignment.

The documentation for this class was generated from the following file:
Cgu
Definition: application.h:44
Cgu::Utf8::uniwide_to_utf8
std::string uniwide_to_utf8(const std::wstring &input)
Cgu::Utf8::Iterator
A class which will iterate through a std::string object by reference to unicode characters rather tha...
Definition: convert.h:391