Re: [PATCH] iconv.3: Clarify the behavior when input is untranslatable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Alejandro Colomar wrote:
> This patch adds language that reflects the actual behavior, by adding an
> explicit bullet that distinguishes this case.

That is the right approach. Thanks for taking the initiative.

But I think that more details should be added, so that programmers are
not surprised if their program behaves differently on, say, musl libc
or FreeBSD than on glibc.

Find attached my take to describe the condition appropriately.

Bruno

>From bc3102bd88b2c481d49cdb3433d8520d1289271b Mon Sep 17 00:00:00 2001
From: Bruno Haible <bruno@xxxxxxxxx>
Date: Sun, 21 May 2023 13:05:29 +0200
Subject: [PATCH] List a fifth conditions when iconv(3) may stop.

Link: https://sourceware.org/bugzilla/show_bug.cgi?id=29913#c4
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217059
Reported-by: Steffen Nurpmeso <steffen@xxxxxxxxxx>
Reported-by: Reuben Thomas <rrt@xxxxxxxx>
Signed-off-by: Bruno Haible <bruno@xxxxxxxxx>
---
 man3/iconv.3 | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/man3/iconv.3 b/man3/iconv.3
index 66f59b8c3..b440da578 100644
--- a/man3/iconv.3
+++ b/man3/iconv.3
@@ -71,7 +71,7 @@ If the character encoding of the input is stateful, the
 function can also convert a sequence of input bytes
 to an update to the conversion state without producing any output bytes;
 such input is called a \fIshift sequence\fP.
-The conversion can stop for four reasons:
+The conversion can stop for five reasons:
 .IP \[bu] 3
 An invalid multibyte sequence is encountered in the input.
 In this case,
@@ -80,6 +80,34 @@ it sets \fIerrno\fP to \fBEILSEQ\fP and returns
 \fI*inbuf\fP
 is left pointing to the beginning of the invalid multibyte sequence.
 .IP \[bu]
+A multibyte sequence is encountered that is valid but that cannot be
+translated to the character encoding of the output.  This condition
+depends on the implementation and on the conversion descriptor.
+In the GNU C library and GNU libiconv, if
+.I cd
+was created without the suffix
+.B //TRANSLIT
+or
+.BR //IGNORE ,
+the conversion is strict: lossy conversions produce this condition.
+If the suffix
+.B //TRANSLIT
+was specified, transliteration can avoid this condition in some cases.
+In the musl C library, this condition cannot occur because a conversion to
+.B '*'
+is used as a fallback.
+In the FreeBSD, NetBSD, and Solaris implementations of
+.BR iconv ,
+this condition cannot occur either, because a conversion to
+.B '?'
+is used as a fallback.
+When this condition is met,
+.B iconv
+sets \fIerrno\fP to \fBEILSEQ\fP and returns
+.IR (size_t)\ \-1 .
+\fI*inbuf\fP
+is left pointing to the beginning of the invalid multibyte sequence.
+.IP \[bu]
 The input byte sequence has been entirely converted,
 that is, \fI*inbytesleft\fP has gone down to 0.
 In this case,
-- 
2.34.1


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux