c++-gtk-utils
Public Member Functions
Cgu::Utf8::Reassembler Class Reference

A class for reassembling UTF-8 strings sent over pipes and sockets so they form complete valid UTF-8 characters. More...

#include <c++-gtk-utils/reassembler.h>

List of all members.

Public Member Functions

Cgu::SharedHandle< char * > operator() (const char *input, size_t size)
size_t get_stored () const
void reset ()
 Reassembler ()

Detailed Description

A class for reassembling UTF-8 strings sent over pipes and sockets so they form complete valid UTF-8 characters.

Utf8::Reassembler is a functor class which takes in a partially formed UTF-8 string and returns a null terminated string comprising such of the input string (after inserting, at the beginning, any partially formed UTF-8 character which was at the end of the input string passed in previous calls to the functor) as forms complete UTF-8 characters (storing any partial character at the end for the next call to the functor). If the input string contains invalid UTF-8 after adding any stored previous part character (apart from any partially formed character at the end of the input string) then operator() will return a null Cgu::SharedHandle<char*> object (that is, Cgu::SharedHandle<char*>::get() will return 0). Such input will not be treated as invalid if it consists only of a single partly formed UTF-8 character which could be valid if further bytes were received and added to it. In that case the returned SharedHandle<char*> object will contain an allocated string of zero length (apart from the terminating 0 character), rather than a NULL pointer.

This enables UTF-8 strings to be sent over pipes, sockets, etc and displayed in a GTK+ object at the receiving end

Note that for efficiency reasons the memory held in the returned Cgu::SharedHandle<char*> object may be greater than the length of the null-terminated string that is contained in that memory: just let the Cgu::SharedHandle<char*> object manage the memory, and use the contents like any other null-terminated string.

This class is not needed if std::getline(), with its default '\n' delimiter, is used to read UTF-8 characters using, say, Cgu::fdistream, because a whole '\n' delimited line of UTF-8 characters will always be complete.

This is an example of its use, reading from a pipe until it is closed by the writer and putting the received text in a GtkTextBuffer object:

   using namespace Cgu;

   GtkTextIter end;
   GtkTextBuffer* text_buffer = gtk_text_view_get_buffer(GTK_TEXT_VIEW(text_view));
   gtk_text_buffer_get_end_iter(text_buffer, &end);

   Utf8::Reassembler reassembler;
   const int BSIZE = 1024;
   char read_buffer[BSIZE];
   ssize_t res;
   do {
     res = ::read(fd, read_buffer, BSIZE);
     if (res > 0) {
       SharedHandle<char*> utf8(reassembler(read_buffer, res));
       if (utf8.get()) {
         gtk_text_buffer_insert(text_buffer, &end,
                                utf8.get(), std::strlen(utf8));
       }
       else std::cerr << "Invalid utf8 text sent over pipe\n";
     }
   } while (res && (res != -1 || errno == EINTR));

Constructor & Destructor Documentation

Cgu::Utf8::Reassembler::Reassembler ( ) [inline]

The constructor will not throw.


Member Function Documentation

size_t Cgu::Utf8::Reassembler::get_stored ( ) const [inline]

Gets the number of bytes of a partially formed UTF-8 character stored for the next call to operator()(). It will not throw.

Returns:
The number of bytes.
Cgu::SharedHandle<char*> Cgu::Utf8::Reassembler::operator() ( const char *  input,
size_t  size 
)

Takes a byte array of wholly or partly formed UTF-8 characters to be converted (after taking account of previous calls to the method) to a valid string of wholly formed characters.

Parameters:
inputThe input array.
sizeThe number of bytes in the input (not the number of UTF-8 characters).
Returns:
A Cgu::SharedHandle<char*> object holding a null terminated string comprising such of the input (after inserting, at the beginning, any partially formed UTF-8 character which was at the end of the input passed in previous calls to the functor) as forms complete UTF-8 characters (storing any partial character at the end for the next call to the functor). If the input is invalid after such recombination, then a null Cgu::SharedHandle<char*> object is returned (that is, Cgu::SharedHandle<char*>::get() will return 0). Such input will not be treated as invalid if it consists only of a single partly formed UTF-8 character which could be valid if further bytes were received and added to it. In that case the returned Cgu::SharedHandle<char*> object will contain an allocated string of zero length (apart from the terminating 0 character), rather than a NULL pointer.
Exceptions:
std::bad_allocThe method might throw std::bad_alloc if memory is exhausted and the system throws in that case. It will not throw any other exception.
void Cgu::Utf8::Reassembler::reset ( ) [inline]

Resets the Reassembler, by discarding any partially formed UTF-8 character from previous calls to operator()(). It will not throw.


The documentation for this class was generated from the following file: