[PATCH v1b] iconv.3: Clarify the behavior when input is untranslatable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Reuben Thomas <rrt@xxxxxxxx>

The manual page does not fully reflect the behaviour of glibc's
iconv(3).  The manual page says:

    The conversion can stop for four reasons:

    -  An invalid multibyte sequence is encountered in the input.  In
       this case, it sets errno to EILSEQ and returns (size_t) -1.
       *inbuf is left pointing to the beginning of the invalid multibyte
       sequence.

    [...]

The phrase "An invalid multibyte sequence is encountered in the input"
is confusing, because it suggests that it refers only to the validity of
the input per se (e.g. a non-UTF-8 sequence in input purporting to be
UTF-8).

However, according to the original author of the manual page, Bruno
Haible[1], it also refers to input that cannot be translated to the
desired output encoding; and indeed, glibc's iconv returns EILSEQ when
the input cannot be translated, even though it is valid.

This patch adds language that reflects the actual behavior.

Link: [1] <https://sourceware.org/bugzilla/show_bug.cgi?id=29913#c4>
Link: <https://bugzilla.kernel.org/show_bug.cgi?id=217059>
Signed-off-by: Reuben Thomas <rrt@xxxxxxxx>
Cc: Steffen Nurpmeso <steffen@xxxxxxxxxx>
Cc: Bruno Haible <bruno@xxxxxxxxx>
Cc: Martin Sebor <msebor@xxxxxxxxxx>
Signed-off-by: Alejandro Colomar <alx@xxxxxxxxxx>
---

Hi,

I'm resending Reuben's patch inline CCing all interested parties.  I'm,
similarly to Steffen, not convinced that invalid input englobes output
errors.  So, I think it would be better to split it into 2 different
reasons, so that we have a 5th reason for the error.

I also slightly tweaked the commit log regarding formatting.

What do you think about having a 5th reason?

Cheers,
Alex

 man3/iconv.3 | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/man3/iconv.3 b/man3/iconv.3
index 66f59b8c3..e8694ca12 100644
--- a/man3/iconv.3
+++ b/man3/iconv.3
@@ -73,7 +73,8 @@ .SH DESCRIPTION
 such input is called a \fIshift sequence\fP.
 The conversion can stop for four reasons:
 .IP \[bu] 3
-An invalid multibyte sequence is encountered in the input.
+An multibyte sequence is encountered in the input which is either invalid,
+or cannot be translated to the character encoding of the output.
 In this case,
 it sets \fIerrno\fP to \fBEILSEQ\fP and returns
 .IR (size_t)\ \-1 .
-- 
2.40.1




[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux