Re: [PATCH] scanf.3: Do not mention the ERANGE error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Branden!

On 1/20/23 18:55, G. Branden Robinson wrote:
[re-ordering the mail I'm quoting]

Hi Alex,

I have some observations on your deprecation initiative and people's
reactions to it.

Sure :)


At 2023-01-20T14:12:07+0100, Alejandro Colomar wrote:
All implementations of sscanf(3) produce Undefined Behavior (UB),
AFAIK.  How much you consider UB to be a real-world issue differs for
each programmer, but I tend to consider all UB to be as bad as nasal
demons.  I'm not saying UB shouldn't exist, just that you shouldn't
invoke it.  And a function that is used for scanning user input is one
of those places where you really want to avoid invoking UB.

If there are common idioms that result in UB, it might be worth
documenting this in the man page, with a citation to the relevant
clause of the standard that declares it thus.

Okay.  See proposed diff below

 I agree that UB is
something to be avoided and I think most other programmers do too.  The
advantage to this approach is that if they disagree, they can take their
argument to the standards body instead of litigating it with you.

:)


This is similar but different to bzero(3).  bzero(3) was broken or
slow in some implementations.  That's probably why it was never added
to ISO C, and why POSIX later removed it.  The API wasn't bad, and in
fact it's great, I prefer it over memset(3).  The difference between
bzero(3) and sscanf(3) is that bzero(3) has now been fixed,

I still don't share your preference here.  The exposure of a more
general interface (memset) by a general-purpose library when the
implementation otherwise has no additional implementation cost is the
correct choice.

While I share your interest in general-purpose over specialized, and that's in essence the essence of Unix, I also believe that encapsulation is very necessary for writing readable code.

Your (and many others') proposal of having a project-specific macro for bzero(3) seems reasonable in absence of a standard name for it. However, having a POSIX-blessed (until recently) name for such an interface, I'd prefer sticking to it. Otherwise, we risk having bzero(), memzero(), zerobytes(), zero(), ... which is not crazy, but hey, I prefer less moving parts when reading code :)

As for removing from POSIX a function just because it's not generic... I have in mind a long list of such features that are equally trivial and unnecessary (and in some cases, they hurt unlike bzero(3), IMO), yet they haven't died. For a representative, let me present our friend:

	printf(3)

Oh boy, tell me it hurts your fingers writing fprintf(stdout, ) but not memset(, 0, ). At least with fprintf(3) it's obvious the ordering of the parameters and I don't need to check the man page.


 If a given programmer's use cases are restricted such

It's not a single given programmer. memset(3) is likely to be the most obvious case where the thin wrapper is what you want to call. There are many uses for fprintf(3), there are many uses for other such functions that have a thin wrapper in the same libc, but memset(3)? How much you've (or any code you know) used it with something other than constant expression 0?

that one of the arguments to a general-purpose function is constant,
then that is exactly the time for them to write a macro or function
specific to their project to hide the complexity.

If you tilt your head right, this is similar to one of the ways closures
are used in other languages.

I'm fine with the function being implemented as a macro, although it would be better to have it as an inline function, so that -Os can produce smaller code if needed. In general, I don't like macros unless there's a need to avoid type conversions; for example for keeping arrays as arrays.


I could change the "deprecated" statements by "see bugs",

I think you've hit upon one of the core drivers of resistance here.  A
problem with calling something "deprecated" is that it's often unstated
_who_ is doing the deprecation.  Traditionally, I think the Linux
man-pages have tended only to use this term in reference to one of the
standards bodies (WG14 or the Austin Group) formally employing it.

There are some pages which have single-handedly deprecated features with no standard or group doing so. I remember having seen a few pages do that, but they are all from prehistoric times, when standards didn't mean so much (or maybe there weren't such standards).


(Maybe I'm wrong, and Linux man-pages _has_ deprecated things in its own
authorial voice...but if other people also don't know that, it doesn't
matter, and confusion remains.)

Yes, they did. Well, confusion always happens when things change. I expect that to settle down. However, I'll try to improve my methods for deprecating broken stuff as much as I can so we can reduce the confusion.


So I suggest you adopt a new phrase, like "discouraged by Linux
man-pages", to characterize the authorial voice here.  Some people will
ignore your advice either way, but at least they'll know who they're
ignoring.[1]

I like deprecating. I want such a strong term. I'll try to clarify that it's the man-pages that do the deprecation, and not a standards body.


However, if somebody really wants to use that function, and would like
to fix it, I encourage that effort.  If the function is fixed, which
shouldn't be that hard, I'm fine removing the messages against its
usage in the manual.

While that doesn't happen, I prefer strongly recommending against
their usage in the manual.  And dict(1) seems to say that the verb for
that is "to deprecate" :)

Your dictionary is correct but social knowledge, a.k.a. tradition and
folklore, impose a context on the discussion.  Sometimes dumb things
become tradition (like calculating factorials or Fibonacci numbers with
recursive functions[2])--we don't have to acquiesce to that, but we will
have to document and sometimes defend our rejection of them.

Right.  memcpy(3) has a bug in the standard.  However, implementations
do the Right Thing (tm).  If implementations did the right thing for
sscanf(3), that would be enough to remove the recommendation against
it.  But my understanding is that the sscanf(3) implementation is not
free of that problem.

This is a good opportunity to say so in these terms.  "Linux man-pages
discourages use of sscanf [under the conditions XXX] until
implementations are corrected to avoid undefined behavior [cite URL
here]."[3]

Regards,
Branden

[1] In groff_man(7), I admit I have not taken my own advice, and use the
     term "deprecated" in a subsection heading.  I have two defenses for
     this.  (1) I reorganized the man page along those lines 5-6 years
     ago, when I had less practice at writing technical documentation,
     and (2) the man(7) macros are not formally standardized anywhere
     anyway.  There is no "official" body with which to conflict, or with
     whom groff can be confused by the reader.

     After groff 1.23 is released (good news, I heard from Bertrand last
     weekend)

Nice :)

I hope to add the SunOS extension "SB" to the deprecation
     list now that Solaris's death seems irreversible.

[2] https://sleeplessafternoon.wordpress.com/2013/03/26/examples-of-recursion-the-good-the-bad-and-the-silly/

     For the mathematically or algorithmically inclined, I also
     recommend "The Genuine Sieve of Eratosthenes", by Melissa O'Neill.

     https://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf

[3] groff_man(7) gives you UR/UE, so use them!  >:-)

How about the following?

Cheers,

Alex

---

diff --git a/man3/sscanf.3 b/man3/sscanf.3
index 26a02521b..870c6f54b 100644
--- a/man3/sscanf.3
+++ b/man3/sscanf.3
@@ -653,6 +653,25 @@ .SS The 'a' assignment-allocation modifier
 .I gcc\~\-std=c99
 etc.).
 .SH BUGS
+.SS Numeric conversion specifiers
+Use of the numeric conversion specifiers produces Undefined Behavior
+for invalid input.
+See
+.UR https://port70.net/\:%7Ensz/\:c/\:c11/\:n1570.html\:#7.21.6.2p10
+C11 7.21.6.2/10
+.UE .
+This is a bug in the ISO C standard,
+and not an inherent design issue with the API.
+However,
+current implementations are not safe from that bug,
+so it is not recommended to use them.
+Instead,
+programs should use functions such as
+.BR strtol (3)
+to parse numeric input.
+This manual page deprecates use of the numeric conversion specifiers
+until they are fixed by ISO C.
+.SS Nonstandard modifiers
 These functions are fully C99 conformant, but provide the
 additional modifiers
 .B q


--
<http://www.alejandro-colomar.es/>

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux