c++-gtk-utils
|
A class for reassembling UTF-8 strings sent over pipes and sockets so they form complete valid UTF-8 characters. More...
#include <c++-gtk-utils/reassembler.h>
Public Member Functions | |
Cgu::SharedHandle< char * > | operator() (const char *input, size_t size) |
size_t | get_stored () const noexcept |
void | reset () noexcept |
Reassembler () noexcept | |
A class for reassembling UTF-8 strings sent over pipes and sockets so they form complete valid UTF-8 characters.
Utf8::Reassembler is a functor class which takes in a partially formed UTF-8 string and returns a nul-terminated string comprising such of the input string (after inserting, at the beginning, any partially formed UTF-8 character which was at the end of the input string passed in previous calls to the functor) as forms complete UTF-8 characters (storing any partial character at the end for the next call to the functor). If the input string contains invalid UTF-8 after adding any stored previous part character (apart from any partially formed character at the end of the input string) then operator() will return a null Cgu::SharedHandle<char*> object (that is, Cgu::SharedHandle<char*>::get() will return 0). Such input will not be treated as invalid if it consists only of a single partly formed UTF-8 character which could be valid if further bytes were received and added to it. In that case the returned SharedHandle<char*> object will contain an allocated string of zero length, comprising only a terminating \0 character, rather than a NULL pointer.
This enables UTF-8 strings to be sent over pipes, sockets, etc and displayed in a GTK object at the receiving end
Note that for efficiency reasons the memory held in the returned Cgu::SharedHandle<char*> object may be greater than the length of the nul-terminated string that is contained in that memory: just let the Cgu::SharedHandle<char*> object manage the memory, and use the contents like any other nul-terminated string.
This class is not needed if std::getline(), with its default '\n' delimiter, is used to read UTF-8 characters using, say, Cgu::fdistream, because a whole '\n' delimited line of UTF-8 characters will always be complete.
This is an example of its use, reading from a pipe until it is closed by the writer and putting the received text in a GtkTextBuffer object:
|
inlinenoexcept |
The constructor will not throw.
|
inlinenoexcept |
Gets the number of bytes of a partially formed UTF-8 character stored for the next call to operator()(). It will not throw.
Cgu::SharedHandle<char*> Cgu::Utf8::Reassembler::operator() | ( | const char * | input, |
size_t | size | ||
) |
Takes a byte array of wholly or partly formed UTF-8 characters to be converted (after taking account of previous calls to the method) to a valid string of wholly formed characters.
input | The input array. |
size | The number of bytes in the input (not the number of UTF-8 characters). |
std::bad_alloc | The method might throw std::bad_alloc if memory is exhausted and the system throws in that case. It will not throw any other exception. |
|
inlinenoexcept |
Resets the Reassembler, by discarding any partially formed UTF-8 character from previous calls to operator()(). It will not throw.