unicode_u_ucs4_native, unicode_u_ucs2_native, unicode_convert_init, unicode_convert, unicode_convert_deinit, unicode_convert_tocbuf_init, unicode_convert_tou_init, unicode_convert_fromu_init, unicode_convert_uc, unicode_convert_tocbuf_toutf8_init, unicode_convert_tocbuf_fromutf8_init, unicode_convert_toutf8, unicode_convert_fromutf8, unicode_convert_tobuf, unicode_convert_tou_tobuf, unicode_convert_fromu_tobuf - unicode character set conversion
or
lqUCS-4LErq, matching the native
unicode_char
endianness.
unicode_u_ucs2_native[] contains the string
lqUCS-2BErq
or
lqUCS-2LErq, matching the native
unicode_char
endianness.
unicode_convert_init(),
unicode_convert(), and
unicode_convert_deinit() are an adaption of th
m[blue]iconv(3)m[][1]
API that uses the same calling convention as the other algorithms in this unicode library, with some value-added features. These functions use
iconv(3)
to effect the actual character set conversion.
unicode_convert_init() returns a non-NULL handle for the requested conversion, or NULL if the requested conversion is not available.
unicode_convert_init() takes a pointer to the output function that receives receives converted character text. The output function receives a pointer to the converted character text, and the number of characters in the converted text. The output function gets repeatedly called, until it receives the entire converted text.
The character text to convert gets passed, repeatedly, to
unicode_convert(). Each call to
unicode_convert() results in the output function getting invoked, zero or more times, with each successive part of the converted text. Finally,
unicode_convert_deinit() stops the conversion and deallocates the conversion handle.
It's possible that a call to
unicode_convert_deinit() results in some additional calls to the output function, passing the remaining, final parts, of the converted text, before
unicode_convert_deinit() deallocates the handle, and returns.
The output function should return 0 normally. A non-0 return indicates n error condition.
unicode_convert_deinit() returns non-zero if any previous invocation of the output function returned non-zero (this includes any invocations of the output function resulting from this call, or prior
unicode_convert() calls), or 0 if all invocations of the output function returned 0.
If the
errptr
is not
NULL, *errptr
gets set to non-zero if there were any conversion errors -- if there was any text that could not be converted to the destination character text.
unicode_convert() also returns non-zero if it calls the output function and it returns non-zero, however the conversion handle remains allocated, so
unicode_convert_deinit() must still be called, to clean that up.
Collecting converted text into a buffer
Call
unicode_convert_tocbuf_init() instead of
unicode_convert_init(), then call
unicode_convert() and
unicode_convert_deinit() normally. The parameters to
unicode_convert_init() specify the source and the destination character sets.
unicode_convert_tocbuf_toutf8_init() is just an alias that specifies
UTF-8
as the destination character set.
unicode_convert_tocbuf_fromutf8_init() is just an alias that specifies
UTF-8
as the source character st.
These functions supply an output function that collects the converted text into a malloc()ed buffer. If
unicode_convert_deinit() returns 0, *cbufptr_ret
gets initialized to a malloc()ed buffer, and the number of converted characters, the size of the malloc()ed buffer, get placed into *cbufsize_ret.
-
Note
If the converted string is an empty string, *cbufsize_ret
gets set to 0, but *cbufptr_ret
still gets initialized (to a dummy malloced buffer).
A non-zero
nullterminate
places a trailing \0 character after the converted string (this is included in *cbufsize_ret).
Converting between character sets and unicode
unicode_convert_tou_init() converts character text into a
unicode_char
buffer. It works just like
unicode_convert_tocbuf_init(), except that only the source character set gets specified and the output buffer is a
unicode_char
buffer.
nullterminate
terminates the converted unicode characters with a
U+0000.
unicode_convert_fromu_init() converts
unicode_chars to the output character set, and also works like
unicode_convert_tocbuf_init(). Additionally, in this case,
unicode_convert_uc() works just like
unicode_convert() except that the input sequence is a
unicode_char
sequence, and the count parameter is th enumber of unicode characters.
One-shot conversions
unicode_convert_toutf8() converts the specified text in the specified text into a UTF-8 string, returning a malloced buffer. If
error
is not
NULL, even if
unicode_convert_toutf8() returns a non
NULL
value *error
gets set to a non-zero value if a character conversion error has occured, and some characters could not be converted.
unicode_convert_fromutf8() does a similar conversion from UTF-8
text
to the specified character set.
unicode_convert_tobuf() does a similar conversion between two different character sets.
unicode_convert_tou_tobuf() calls
unicode_convert_tou_init(), feeds the character string through
unicode_convert(), then calls
unicode_convert_deinit(). If this function returns 0, *uc
and *ucsize
are set to a malloced buffer+size holding the unicode char array.
unicode_convert_fromu_tobuf() calls
unicode_convert_fromu_init(), feeds the unicode array through
unicode_convert_uc(), then calls unicode_convert_deinit(). If this function returns 0, *c
and *csize
are set to a malloced buffer+size holding the char array.
SEE ALSO
courier-unicode(7),
unicode_convert_tocase(3),
unicode_default_chset(3).
AUTHOR
Sam Varshavchik
-
Author
NOTES
- 1.
-
iconv(3)
-
http://manpages.courier-mta.org/htmlman3/iconv.3.html
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- Collecting converted text into a buffer
-
- Converting between character sets and unicode
-
- One-shot conversions
-
- SEE ALSO
-
- AUTHOR
-
- NOTES
-