from small one page howto to huge articles all in one place
 

search text in:





Poll
Which linux distribution do you use?







poll results

Last additions:
using iotop to find disk usage hogs

using iotop to find disk usage hogs

words:

887

views:

195651

userrating:

average rating: 1.7 (102 votes) (1=very good 6=terrible)


May 25th. 2007:
Words

486

Views

252057

why adblockers are bad


Workaround and fixes for the current Core Dump Handling vulnerability affected kernels

Workaround and fixes for the current Core Dump Handling vulnerability affected kernels

words:

161

views:

140921

userrating:

average rating: 1.4 (42 votes) (1=very good 6=terrible)


April, 26th. 2006:

Druckversion
You are here: manpages





UNICODE_WORD_BREAK

Section: Courier Unicode Library (3)
Updated: 07/29/2015
Index Return to Main Contents
 

NAME

unicode_wb_init, unicode_wb_next, unicode_wb_next_cnt, unicode_wb_end, unicode_wbscan_init, unicode_wbscan_next, unicode_wbscan_end - calculate word breaks  

SYNOPSIS

#include <courier-unicode.h>
unicode_wb_info_t unicode_wb_init(int (*cb_func)(int, void *), void *cb_arg);
int unicode_wb_next(unicode_wb_info_t wb, unicode_char c);
int unicode_wb_next_cnt(unicode_wb_info_t wb, const unicode_char *cptr, size_t cnt);
int unicode_wb_end(unicode_wb_info_t wb);
unicode_wbscan_info_t unicode_wbscan_init(void);
int unicode_wbscan_next(unicode_wbscan_info_t wbs, unicode_char c);
size_t unicode_wbscan_end(unicode_wbscan_info_t wbs);
 

DESCRIPTION

These functions implement the unicode word breaking algorithm. Invoke unicode_wb_init() to initialize the word breaking algorithm. The first parameter is a callback function. The second parameter is an opaque pointer. The callback function gets invoked with two parameters. The second parameter is the opaque pointer that was given to unicode_wb_init(); and the opaque pointer is not subject to any further interpretation by these functions.

unicode_wb_init() returns an opaque handle. Repeated invocations of unicode_wb_next(), passing the handle, and one unicode character defines a sequence of unicode characters over which the word breaking algorithm calculation takes place. unicode_wb_next_cnt() is a shortcut for invoking unicode_wb_next() repeatedly over an array cptr containing cnt unicode characters.

unicode_wb_end() denotes the end of the unicode character sequence. After the call to unicode_wb_end() the word breaking unicode_wb_info_t handle is no longer valid.

Between the call to unicode_wb_init() and unicode_wb_end(), the callback function gets invoked exactly once for each unicode character given to unicode_wb_next() or unicode_wb_next_cnt(). Usually each call to unicode_wb_next() results in the callback function getting invoked immediately, but it does not have to be. It's possible that a call to unicode_wb_next() returns without invoking the callback function, and some subsequent call to unicode_wb_next() (or unicode_wb_end()) invokes the callback function more than once, to catch things up. The contract is that before unicode_wb_end() returns, the callback function gets invoked the exact number of times as the number of characters in the unicode sequence defined by the intervening calls to unicode_wb_next() and unicode_wb_next_cnt(), unless an error occurs.

Each call to the callback function reports the calculated wordbreaking status of the corresponding character in the unicode character sequence. If the parameter to the callback function is non zero, a word break is permitted before the corresponding character. A zero value indicates that a word break is prohibited before the corresponding character.

The callback function should return 0. A non-zero value indicates to the word breaking algorithm that an error has occured. unicode_wb_next() and unicode_wb_next_cnt() return zero either if they never invoked the callback function, or if each call to the callback function returned zero. A non zero return from the callback function results in unicode_wb_next() and unicode_wb_next_cnt() immediately returning the same value.

unicode_wb_end() must be invoked to destroy the word breaking handle even if unicode_wb_next() and unicode_wb_next_cnt() returned an error indication. It's also possible that, under normal circumstances, unicode_wb_end() invokes the callback function one or more times. The return value from unicode_wb_end() has the same meaning as from unicode_wb_next() and unicode_wb_next_cnt(); however in all cases after unicode_wb_end() returns the line breaking handle is no longer valid.  

Word scan

unicode_wbscan_init(), unicode_wbscan_next() and unicode_wbscan_end scan for the next word boundary in a unicode character sequence. unicode_wbscan_init() obtains a handle, then unicode_wbscan_next() gets repeatedly invoked to define the unicode character sequence. unicode_wbscan_end() deallocates the handle and returns the number of leading characters in the unicode character sequence up to the first word break.

A non-0 return value from unicode_wbscan_next() indicates that the word boundary is already known, and any further calls to unicode_wbscan_next() will be ignored. unicode_wbscan_end() must still be called, to obtain the unicode character count.  

SEE ALSO

m[blue]TR-29m[][1], courier-unicode(7), unicode::wordbreak(3), unicode_convert_tocase(3), unicode_line_break(3), unicode_grapheme_break(3).  

AUTHOR

Sam Varshavchik

Author
 

NOTES

1.
TR-29
http://www.unicode.org/reports/tr29/tr29-27.html


 

Index

NAME
SYNOPSIS
DESCRIPTION
Word scan
SEE ALSO
AUTHOR
NOTES





Support us on Content Nation
rdf newsfeed | rss newsfeed | Atom newsfeed
- Powered by LeopardCMS - Running on Gentoo -
Copyright 2004-2020 Sascha Nitsch Unternehmensberatung GmbH
Valid XHTML1.1 : Valid CSS : buttonmaker
- Level Triple-A Conformance to Web Content Accessibility Guidelines 1.0 -
- Copyright and legal notices -
Time to create this page: 18.1 ms