from small one page howto to huge articles all in one place
 

search text in:





Poll
Which kernel version do you use?





poll results

Last additions:
using iotop to find disk usage hogs

using iotop to find disk usage hogs

words:

887

views:

196713

userrating:

average rating: 1.7 (102 votes) (1=very good 6=terrible)


May 25th. 2007:
Words

486

Views

252324

why adblockers are bad


Workaround and fixes for the current Core Dump Handling vulnerability affected kernels

Workaround and fixes for the current Core Dump Handling vulnerability affected kernels

words:

161

views:

141294

userrating:

average rating: 1.4 (42 votes) (1=very good 6=terrible)


April, 26th. 2006:

Druckversion
You are here: manpages





UNICODE_LINE_BREAK

Section: Courier Unicode Library (3)
Updated: 07/29/2015
Index Return to Main Contents
 

NAME

unicode_lb_init, unicode_lb_set_opts, unicode_lb_next, unicode_lb_next_cnt, unicode_lb_end, unicode_lbc_init, unicode_lbc_set_opts, unicode_lbc_next, unicode_lbc_next_cnt, unicode_lbc_end - calculate mandatory or allowed line breaks  

SYNOPSIS

#include <courier-unicode.h>
unicode_lb_info_t unicode_lb_init(int (*cb_func)(int, void *), void *cb_arg);
void unicode_lb_set_opts(unicode_lb_info_t lb, int opts);
int unicode_lb_next(unicode_lb_info_t lb, unicode_char c);
int unicode_lb_next_cnt(unicode_lb_info_t lb, const unicode_char *cptr, size_t cnt);
int unicode_lb_end(unicode_lb_info_t lb);
unicode_lbc_info_t unicode_lbc_init(int (*cb_func)(int, unicode_char, void *), void *cb_arg);
void unicode_lbc_set_opts(unicode_lbc_info_t lb, int opts);
int unicode_lbc_next(unicode_lb_info_t lb, unicode_char c);
int unicode_lbc_next_cnt(unicode_lb_info_t lb, const unicode_char *cptr, size_t cnt);
int unicode_lbc_end(unicode_lb_info_t lb);
 

DESCRIPTION

These functions implement the unicode line breaking algorithm. Invoke unicode_lb_init() to initialize the line breaking algorithm. The first parameter is a callback function. The second parameter is an opaque pointer. The callback function gets invoked with two parameters. The first parameter is one of three values: UNICODE_LB_MANDATORY, UNICODE_LB_NONE, or UNICODE_LB_ALLOWED, as described below. The second parameter is the opaque pointer that was passed to unicode_lb_init(); the opaque pointer is not subject to any further interpretation by these functions.

unicode_lb_init() returns an opaque handle. Repeated invocations of unicode_lb_next(), passing the handle and one unicode character at a time, defines a sequence of unicode characters over which the line breaking algorithm calculation takes place. unicode_lb_next_cnt() is a shortcut for invoking unicode_lb_next() repeatedly over an array cptr containing cnt unicode characters.

unicode_lb_end() denotes the end of the unicode character sequence. After the call to unicode_lb_end() the line breaking unicode_lb_info_t handle is no longer valid.

Between the call to unicode_lb_init() and unicode_lb_end(), the callback function gets invoked exactly once for each unicode character given to unicode_lb_next() or unicode_lb_next_cnt(). Usually each call to unicode_lb_next() results in the callback function getting invoked immediately, but it does not have to be. It's possible that a call to unicode_lb_next() returns without invoking the callback function, and some subsequent call to unicode_lb_next() (or unicode_lb_end()) invokes the callback function more than once, to catch up. The contract is that before unicode_lb_end() returns, the callback function gets invoked the exact number of times as the number of characters in the unicode sequence defined by the intervening calls to unicode_lb_next() and unicode_lb_next_cnt(), unless an error occurs.

Each call to the callback function reports the calculated line breaking status of the corresponding character in the unicode character sequence:

UNICODE_LB_MANDATORY

A line break is MANDATORY before the corresponding character.

UNICODE_LB_NONE

A line break is PROHIBITED before the corresponding character.

UNICODE_LB_ALLOWED

A line break is OPTIONAL before the corresponding character.

The callback function should return 0. A non-zero value indicates to the line breaking algorithm that an error has occured. unicode_lb_next() and unicode_lb_next_cnt() return zero either if they never invoked the callback function, or if each call to the callback function returned zero. A non zero return from the callback function results in unicode_lb_next() and unicode_lb_next_cnt() immediately returning the same value.

unicode_lb_end() must be invoked to destroy the line breaking handle even if unicode_lb_next() and unicode_lb_next_cnt() returned an error indication. It's also possible that, under normal circumstances, unicode_lb_end() invokes the callback function one or more times. The return value from unicode_lb_end() has the same meaning as from unicode_lb_next() and unicode_lb_next_cnt(); however in all cases after unicode_lb_end() returns the line breaking handle is no longer valid.  

Alternative callback function

unicode_lbc_init(), unicode_lbc_next(), unicode_lbc_next_cnt(), unicode_lbc_end() are alternative functions that implement the same algorithm. The only difference is that the callback function receives an extra parameter, the unicode character value to which the line breaking status applies to, passed through from the input unicode character sequence.  

Options

unicode_lb_set_opts() and unicode_lbc_set_opts() enable non-default options for the line breaking algorithm. These functions must be called immediately after unicode_lb_init() or unicode_lbc_init(), and before any other function. opts is a bitmask that can contain the following values:

UNICODE_LB_OPT_PRBREAK

Enables a modified LB24 rule. This prevents plus signs, as in lqC++rq from breaking. This flag adds the following rules to the LB24 rule:

                        PR x PR

                        AL x PR

                        ID x PR

UNICODE_LB_OPT_SYBREAK

Tailored breaking rules for the lq/rq character. This prevents breaking after the lq/rq character (think URLs); including an exception to the lqx SYrq rule in LB13. This flag adds the following rules to the LB24 rule:

                        SY x EX

                        SY x AL

                        SY x ID

                        SP ÷ SY, which takes precedence over "x SY".

UNICODE_LB_OPT_DASHWJ

This flag reclassifies U+2013 and U+2014 as class WJ, prohibiting breaks before and after the m-dash and the n-dash unicode characters.
 

SEE ALSO

courier-unicode(7), unicode::linebreak(3), m[blue]TR-14m[][1]  

AUTHOR

Sam Varshavchik

Author
 

NOTES

1.
TR-14
http://www.unicode.org/reports/tr14/tr14-35.html


 

Index

NAME
SYNOPSIS
DESCRIPTION
Alternative callback function
Options
SEE ALSO
AUTHOR
NOTES





Support us on Content Nation
rdf newsfeed | rss newsfeed | Atom newsfeed
- Powered by LeopardCMS - Running on Gentoo -
Copyright 2004-2020 Sascha Nitsch Unternehmensberatung GmbH
Valid XHTML1.1 : Valid CSS : buttonmaker
- Level Triple-A Conformance to Web Content Accessibility Guidelines 1.0 -
- Copyright and legal notices -
Time to create this page: 39.5 ms