c++-gtk-utils
Classes | Functions
Cgu::Utf8 Namespace Reference

This namespace contains utilities relevant to the use of UTF-8 in programs. More...

Classes

class  ConversionError
 
class  Iterator
 A class which will iterate through a std::string object by reference to unicode characters rather than by bytes. More...
 
class  Reassembler
 A class for reassembling UTF-8 strings sent over pipes and sockets so they form complete valid UTF-8 characters. More...
 
class  ReverseIterator
 A class which will iterate in reverse through a std::string object by reference to unicode characters rather than by bytes. More...
 

Functions

std::wstring uniwide_from_utf8 (const std::string &input)
 
std::string uniwide_to_utf8 (const std::wstring &input)
 
std::u32string utf32_from_utf8 (const std::string &input)
 
std::string utf32_to_utf8 (const std::u32string &input)
 
std::u16string utf16_from_utf8 (const std::string &input)
 
std::string utf16_to_utf8 (const std::u16string &input)
 
std::wstring wide_from_utf8 (const std::string &input)
 
std::string wide_to_utf8 (const std::wstring &input)
 
std::string filename_from_utf8 (const std::string &input)
 
std::string filename_to_utf8 (const std::string &input)
 
std::string locale_from_utf8 (const std::string &input)
 
std::string locale_to_utf8 (const std::string &input)
 
bool validate (const std::string &text)
 
bool operator== (const Iterator &iter1, const Iterator &iter2)
 
bool operator!= (const Iterator &iter1, const Iterator &iter2)
 
bool operator< (const Iterator &iter1, const Iterator &iter2)
 
bool operator<= (const Iterator &iter1, const Iterator &iter2)
 
bool operator> (const Iterator &iter1, const Iterator &iter2)
 
bool operator>= (const Iterator &iter1, const Iterator &iter2)
 
bool operator== (const ReverseIterator &iter1, const ReverseIterator &iter2)
 
bool operator!= (const ReverseIterator &iter1, const ReverseIterator &iter2)
 
bool operator< (const ReverseIterator &iter1, const ReverseIterator &iter2)
 
bool operator<= (const ReverseIterator &iter1, const ReverseIterator &iter2)
 
bool operator> (const ReverseIterator &iter1, const ReverseIterator &iter2)
 
bool operator>= (const ReverseIterator &iter1, const ReverseIterator &iter2)
 

Detailed Description

This namespace contains utilities relevant to the use of UTF-8 in programs.

#include <c++-gtk-utils/convert.h> (for conversion and validation functions)

#include <c++-gtk-utils/reassembler.h> (for Reassembler class)

See also
convert.h reassembler.h

This namespace contains utilities relevant to the use of UTF-8 in programs. If you want these functions to work, you will generally have needed to have set the locale in the relevant program with either std::locale::global(std::locale("")) (from the C++ standard library) or setlocale(LC_ALL,"") (from the C standard library).

Function Documentation

std::string Cgu::Utf8::filename_from_utf8 ( const std::string &  input)

Converts text from UTF-8 to the system's filename encoding.

Parameters
inputText in valid UTF-8 format.
Returns
The input text converted to filename encoding.
Exceptions
Cgu::Utf8::ConversionErrorThis exception will be thrown if conversion fails because the input string is not in valid UTF-8 format, or cannot be converted to filename encoding (eg because the input characters cannot be represented by that encoding).
std::bad_allocThis function might throw std::bad_alloc if memory is exhausted and the system throws in that case.
Note
glib takes the system's filename encoding from the environmental variables G_FILENAME_ENCODING and G_BROKEN_FILENAMES. If G_BROKEN_FILENAMES is set to 1 and G_FILENAME_ENCODING is not set, it will be assumed that the filename encoding is the same as the locale encoding. If G_FILENAME_ENCODING is set, then G_BROKEN_FILENAMES is ignored, and filename encoding is taken from the value held by G_FILENAME_ENCODING.
std::string Cgu::Utf8::filename_to_utf8 ( const std::string &  input)

Converts text from the system's filename encoding to UTF-8.

Parameters
inputText in valid filename encoding.
Returns
The input text converted to UTF-8.
Exceptions
Cgu::Utf8::ConversionErrorThis exception will be thrown if conversion fails because the input string is not in valid filename encoding.
std::bad_allocThis function might throw std::bad_alloc if memory is exhausted and the system throws in that case.
Note
glib takes the system's filename encoding from the environmental variables G_FILENAME_ENCODING and G_BROKEN_FILENAMES. If G_BROKEN_FILENAMES is set to 1 and G_FILENAME_ENCODING is not set, it will be assumed that the filename encoding is the same as the locale encoding. If G_FILENAME_ENCODING is set, then G_BROKEN_FILENAMES is ignored, and filename encoding is taken from the value held by G_FILENAME_ENCODING.
std::string Cgu::Utf8::locale_from_utf8 ( const std::string &  input)

Converts text from UTF-8 to the system's locale encoding.

Parameters
inputText in valid UTF-8 format.
Returns
The input text converted to locale encoding.
Exceptions
Cgu::Utf8::ConversionErrorThis exception will be thrown if conversion fails because the input string is not in valid UTF-8 format, or cannot be converted to locale encoding (eg because the input characters cannot be represented by that encoding).
std::bad_allocThis function might throw std::bad_alloc if memory is exhausted and the system throws in that case.
std::string Cgu::Utf8::locale_to_utf8 ( const std::string &  input)

Converts text from the system's locale encoding to UTF-8.

Parameters
inputText in valid locale encoding.
Returns
The input text converted to UTF-8.
Exceptions
Cgu::Utf8::ConversionErrorThis exception will be thrown if conversion fails because the input string is not in valid locale encoding.
std::bad_allocThis function might throw std::bad_alloc if memory is exhausted and the system throws in that case.
bool Cgu::Utf8::operator!= ( const Iterator iter1,
const Iterator iter2 
)
inline

The comparison operators will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.

bool Cgu::Utf8::operator!= ( const ReverseIterator iter1,
const ReverseIterator iter2 
)
inline

The comparison operators will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.

bool Cgu::Utf8::operator< ( const Iterator iter1,
const Iterator iter2 
)
inline

The comparison operators will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.

bool Cgu::Utf8::operator< ( const ReverseIterator iter1,
const ReverseIterator iter2 
)
inline

The comparison operators will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation. Ordering is viewed from the perspective of the logical operation (reverse iteration), so that for example an iterator at position std::string::rbegin() is less than an iterator at position std::string::rend().

bool Cgu::Utf8::operator<= ( const Iterator iter1,
const Iterator iter2 
)
inline

The comparison operators will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.

bool Cgu::Utf8::operator<= ( const ReverseIterator iter1,
const ReverseIterator iter2 
)
inline

The comparison operators will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation. Ordering is viewed from the perspective of the logical operation (reverse iteration), so that for example an iterator at position std::string::rbegin() is less than an iterator at position std::string::rend().

bool Cgu::Utf8::operator== ( const Iterator iter1,
const Iterator iter2 
)
inline

The comparison operators will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.

bool Cgu::Utf8::operator== ( const ReverseIterator iter1,
const ReverseIterator iter2 
)
inline

The comparison operators will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.

bool Cgu::Utf8::operator> ( const Iterator iter1,
const Iterator iter2 
)
inline

The comparison operators will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.

bool Cgu::Utf8::operator> ( const ReverseIterator iter1,
const ReverseIterator iter2 
)
inline

The comparison operators will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation. Ordering is viewed from the perspective of the logical operation (reverse iteration), so that for example an iterator at position std::string::rbegin() is less than an iterator at position std::string::rend().

bool Cgu::Utf8::operator>= ( const Iterator iter1,
const Iterator iter2 
)
inline

The comparison operators will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation.

bool Cgu::Utf8::operator>= ( const ReverseIterator iter1,
const ReverseIterator iter2 
)
inline

The comparison operators will not throw provided assigning a std::string::const_iterator object does not throw, as it will not in any sane implementation. Ordering is viewed from the perspective of the logical operation (reverse iteration), so that for example an iterator at position std::string::rbegin() is less than an iterator at position std::string::rend().

std::wstring Cgu::Utf8::uniwide_from_utf8 ( const std::string &  input)

Converts text from UTF-8 to the system's Unicode wide character representation, which will be UTF-32/UCS-4 for systems with a wide character size of 4 (almost all unix-like systems), and UTF-16 for systems with a wide character size of 2.

Parameters
inputText in valid UTF-8 format.
Returns
The input text converted to UTF-32 or UTF-16.
Exceptions
Cgu::Utf8::ConversionErrorThis exception will be thrown if conversion fails because the input string is not in valid UTF-8 format or the system does not support wide character Unicode strings.
std::bad_allocThis function might throw std::bad_alloc if memory is exhausted and the system throws in that case.
std::string Cgu::Utf8::uniwide_to_utf8 ( const std::wstring &  input)

Converts text from the system's Unicode wide character representation, which will be UTF-32/UCS-4 for systems with a wide character size of 4 (almost all unix-like systems) and UTF-16 for systems with a wide character size of 2, to narrow character UTF-8 format.

Parameters
inputText in valid UTF-32 or UTF-16 format.
Returns
The input text converted to UTF-8.
Exceptions
Cgu::Utf8::ConversionErrorThis exception will be thrown if conversion fails because the input string is not in valid UTF-32/UCS-4 or UTF-16 format or the system does not support wide character Unicode strings.
std::bad_allocThis function might throw std::bad_alloc if memory is exhausted and the system throws in that case.
std::u16string Cgu::Utf8::utf16_from_utf8 ( const std::string &  input)

Converts text from UTF-8 to UTF-16.

Parameters
inputText in valid UTF-8 format.
Returns
The input text converted to UTF-16.
Exceptions
Cgu::Utf8::ConversionErrorThis exception will be thrown if conversion fails because the input string is not in valid UTF-8 format or the system does not support wide character Unicode strings.
std::bad_allocThis function might throw std::bad_alloc if memory is exhausted and the system throws in that case.
std::string Cgu::Utf8::utf16_to_utf8 ( const std::u16string &  input)

Converts text from UFF-16 to narrow character UTF-8 format.

Parameters
inputText in valid UTF-16 format.
Returns
The input text converted to UTF-8.
Exceptions
Cgu::Utf8::ConversionErrorThis exception will be thrown if conversion fails because the input string is not in valid UTF-16 format or the system does not support wide character Unicode strings.
std::bad_allocThis function might throw std::bad_alloc if memory is exhausted and the system throws in that case.
std::u32string Cgu::Utf8::utf32_from_utf8 ( const std::string &  input)

Converts text from UTF-8 to UTF-32/USC-4.

Parameters
inputText in valid UTF-8 format.
Returns
The input text converted to UTF-32.
Exceptions
Cgu::Utf8::ConversionErrorThis exception will be thrown if conversion fails because the input string is not in valid UTF-8 format or the system does not support wide character Unicode strings.
std::bad_allocThis function might throw std::bad_alloc if memory is exhausted and the system throws in that case.
std::string Cgu::Utf8::utf32_to_utf8 ( const std::u32string &  input)

Converts text from UTF-32/UCS4 to narrow character UTF-8 format.

Parameters
inputText in valid UTF-32 format.
Returns
The input text converted to UTF-8.
Exceptions
Cgu::Utf8::ConversionErrorThis exception will be thrown if conversion fails because the input string is not in valid UTF-32/UCS-4 format or the system does not support wide character Unicode strings.
std::bad_allocThis function might throw std::bad_alloc if memory is exhausted and the system throws in that case.
bool Cgu::Utf8::validate ( const std::string &  text)
inline

Indicates whether the input text comprises valid UTF-8.

Parameters
textThe text to be tested.
Returns
true if the input text is in valid UTF-8 format, otherwise false.
Exceptions
std::bad_allocThis function might throw std::bad_alloc if std::string::data() might throw when memory is exhausted.
Note
#include <c++-gtk-utils/convert.h> for this function.
std::wstring Cgu::Utf8::wide_from_utf8 ( const std::string &  input)

Converts text from UTF-8 to the system's wide character locale representation. For this function to work correctly, the system's installed iconv() must support conversion to a generic wchar_t target, but in POSIX whether it does so is implementation defined (GNU's C library implemention does). For most unix-like systems the wide character representation will be Unicode (UCS-4/UTF-32 or UTF-16), and where that is the case use the uniwide_from_utf8() function instead, which will not rely on the generic target being available.

Parameters
inputText in valid UTF-8 format.
Returns
The input text converted to the system's wide character locale representation.
Exceptions
Cgu::Utf8::ConversionErrorThis exception will be thrown if conversion fails because the input string is not in valid UTF-8 format, or cannot be converted to the system's wide character locale representation (eg because the input characters cannot be represented by that encoding, or the system's installed iconv() function does not support conversion to a generic wchar_t target).
std::bad_allocThis function might throw std::bad_alloc if memory is exhausted and the system throws in that case.
std::string Cgu::Utf8::wide_to_utf8 ( const std::wstring &  input)

Converts text from the system's wide character locale representation to UTF-8. For this function to work correctly, the system's installed iconv() must support conversion from a generic wchar_t target, but in POSIX whether it does so is implementation defined (GNU's C library implemention does). For most unix-like systems the wide character representation will be Unicode (UCS-4/UTF-32 or UTF-16), and where that is the case use the uniwide_to_utf8() function instead, which will not rely on the generic target being available.

Parameters
inputText in a valid wide character locale format.
Returns
The input text converted to UTF-8.
Exceptions
Cgu::Utf8::ConversionErrorThis exception will be thrown if conversion fails because the input string is not in a valid wide character locale format, or cannot be converted to UTF-8 (eg because the system's installed iconv() function does not support conversion from a generic wchar_t target).
std::bad_allocThis function might throw std::bad_alloc if memory is exhausted and the system throws in that case.