Alejandro Colomar wrote: > > Do you have a better wording than "can ... in some cases"? > > If you include the full version in the commit log, to be able to > understand it in the future, I'm fine with it. OK. Here is a patch with the details included in the commit message.
From 4cc4ad011b3ffa30159d3a67e262a46da4600cba Mon Sep 17 00:00:00 2001 From: Bruno Haible <bruno@xxxxxxxxx> Date: Sun, 21 May 2023 13:05:29 +0200 Subject: [PATCH] List a fifth condition when iconv(3) may stop. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The wording regarding transliteration is vague, because this man page is not the right place for going into the details of the transliteration. Here are the details: GNU libc and GNU libiconv support transliteration, for example, of "½" to "1/2", or of "å" to "aa" in a Danish locale. The transliteration maps a multibyte character of the input encoding to zero or more characters in the output. There are two kinds of transliteration rules: - Those that are valid regardless of locale. Typically this means that the original and the transliterated character have similar glyphs, such as in the case "½" to "1/2". In GNU libc, these are collected in the files glibc/localedata/locales/translit_*. - Those that are valid in a single locale only. Often such a rule reflects similar pronounciation of the original and the transliterated characters. Some locales have script-based transliteration, for example from the Cyrillic script to the Latin script. In GNU libc, these are collected in the file glibc/localedata/locales/<locale>. In GNU libiconv, transliterations of this kind are not supported. Link: https://sourceware.org/bugzilla/show_bug.cgi?id=29913#c4 Link: https://bugzilla.kernel.org/show_bug.cgi?id=217059 Reported-by: Steffen Nurpmeso <steffen@xxxxxxxxxx> Reported-by: Reuben Thomas <rrt@xxxxxxxx> Signed-off-by: Bruno Haible <bruno@xxxxxxxxx> --- man3/iconv.3 | 35 ++++++++++++++++++++++++++++++++++- 1 file changed, 34 insertions(+), 1 deletion(-) diff --git a/man3/iconv.3 b/man3/iconv.3 index 66f59b8c3..94441f602 100644 --- a/man3/iconv.3 +++ b/man3/iconv.3 @@ -71,7 +71,7 @@ If the character encoding of the input is stateful, the function can also convert a sequence of input bytes to an update to the conversion state without producing any output bytes; such input is called a \fIshift sequence\fP. -The conversion can stop for four reasons: +The conversion can stop for five reasons: .IP \[bu] 3 An invalid multibyte sequence is encountered in the input. In this case, @@ -80,6 +80,39 @@ it sets \fIerrno\fP to \fBEILSEQ\fP and returns \fI*inbuf\fP is left pointing to the beginning of the invalid multibyte sequence. .IP \[bu] +A multibyte sequence is encountered that is valid but that cannot be +translated to the character encoding of the output. +This condition depends on the implementation and on the conversion +descriptor. +In the GNU C library and GNU libiconv, if +.I cd +was created without the suffix +.B //TRANSLIT +or +.BR //IGNORE , +the conversion is strict: lossy conversions produce this condition. +If the suffix +.B //TRANSLIT +was specified, transliteration can avoid this condition in some cases. +In the musl C library, this condition cannot occur because a conversion to +.B \[aq]*\[aq] +is used as a fallback. +In the FreeBSD, NetBSD, and Solaris implementations of +.BR iconv (), +this condition cannot occur either, because a conversion to +.B \[aq]?\[aq] +is used as a fallback. +When this condition is met, +.BR iconv () +sets +.I errno +to +.B EILSEQ +and returns +.IR (size_t)\ \-1 . +.I *inbuf +is left pointing to the beginning of the unconvertible multibyte sequence. +.IP \[bu] The input byte sequence has been entirely converted, that is, \fI*inbytesleft\fP has gone down to 0. In this case, -- 2.34.1