Re: [PATCH] man7/: ffix

Alejandro Colomar <alx.manpages@xxxxxxxxx> · Fri, 24 Mar 2023 01:29:59 +0100

Hey Branden!

On 3/23/23 23:29, G. Branden Robinson wrote:
> 
>> Going more into what concerns me, which is man3, I often miss an
>> ARGUMENTS (or PARAMETERS, to be more precise) section in the pages for
>> functions.  Sometimes it would be just one line per argument, but in
>> other cases it would help a lot have more organized information.  I'll
>> show you a few cases where I've used it, and where I think it made a
>> difference.
>>
>> <https://github.com/shadow-maint/shadow/blob/master/lib/stpecpy.h>
>> <https://github.com/shadow-maint/shadow/blob/master/lib/stpeprintf.h>
>> <https://github.com/shadow-maint/shadow/blob/master/libmisc/agetpass.c>
> 
> I don't have strong feelings about this.  A deeper principle I hold is
> that functions shouldn't take a lot of arguments in the first place.  If
> they do, it is a sign that
> 
> 1. a data structure is called for, and a pointer to it should be passed;
> 
> and/or
> 
> 2. the function is too complex, tries to do too much, and should be
>    decomposed into orthogonal features.
> 
> The latter doesn't mean you can't also provide a convenience function to
> handle common cases, or show the user how to implement one.  To recall
> and old disagreement of ours, this is why I prefer memset() to bzero()
> as a standard library function.  (Yes, memset() takes more arguments,
> but it is also more _general_.  But I digress...)

While it is more general, I yet haven't been shown a list of uses for it.
I recall 1 use in my entire life (not too long, I know).  It was for
initializing an array of bitfields to all-bits-1.  I would expect that a
function that allows doing that would be something obscure that is rarely
used.

While bzero(3) and memset(3) are similar in the implementation,
they are rather different in the abstract semantics: one zeroes a buffer.
The other allows two uses (that I know): initializing to 1s, in the rare
case where you want all 1s; and initializing memory to some magic
pattern to be able to detect uses of invalid memory.

It would be nice if grepping for memset() would show these rare cases
only, rather than being needles in a huge haystack of zeroing.  A regex
might help, but still...

I would even go further and say that libc doesn't need memset(3).  It's
such a niche function, that we don't really need it in the most
essential library.  I mean, we still don't have strlcpy(3) in some libc
implementations, and it's quite useful.  Why don't we ask users to
implement their own loop for the rare case they want to initialize their
buffers to 0xF0?  Why not a memset32(3) to initialize arrays of
uint32_t?  What's so special about non-zero byte initialization?  If you
need memset(3) to be optimized, you can write libmemset and write it in
assembly...  Can anyone justify the existence of memset(3) in libc?
Apart from the obvious "because we already had it, so why remove it" or
"because ISO C says so".  If we had none of them in libc, and were
presented both bzero(3) and memset(3), I'd go for bzero(3) 10 out of 10
times, as it's essential, and would ask the interested in memset(3) to
write their own niche library.

> But it went in anyway, apparently, on the strength of the functionality.
> Maybe the prospect of fighting Multics on its own ground was too
> appealing to pass up.
> 
>> It's kind of a synopsis of the parameters.  Would it be better _after_
>> the description?  Maybe.
> 
> Maybe not.  _If_ you're going to have an "Arguments" heading for
> section 2 or 3 man page, placing it between "Synopsis" and "Description"
> seems appropriate.

Yep.

> 
>> Is it better than having it all in the description?  I think it is.
>> Will we see this in the Linux man-pages some day?  Maybe.  What's your
>> opinion?
> 
> I think you should collect more opinions.

Yeah, I don't have any plans for that now.  While I use that section
in other projects where I write man pages or man-page-like documentation
from scratch, here that we don't use it, it would be more work than I'm
willing to do for now.  Maybe in something like several years, I could
consider that, if other people want it.

>  Also consider going back to
> the Unix Programmer's Manuals of the 1970s and see how they tackled the
> issue.  The complications of history are not going to make _every_
> simplification impossible.  And you may well find places where these
> manuals were ill-written or the API badly designed.  (Inter-process
> communication was not born elegant in Unix and still isn't to this day.)
> 
>> Yup, I think the man pages should serve as both (short) tutorials
>> *and* quick references.  If I need further info, I go to
>> StackOverflow, but I'd like to understand at least the basics of a
>> function when reading its page (and I've learnt many of the man3
>> functions by reading the pages while maintaining them; for example, I
>> didn't even know there was a regex(3) function until I saw the page
>> being mentioned in a ffix patch by Michael; a few weeks later I needed
>> it, and could use it by just reading the manual; then I added the
>> example program with something close to what I did with it).
> 
> I learned years ago that the only way I can truly learn anything that
> isn't simple is to start rewriting its documentation, which usually
> means conducting a lot of experiments.  In the 6 years or so I've been
> contributing to groff I've amassed a set of 1,433 files in my
> "EXPERIMENTS" subdirectory.  I've also thrown many experiments away.

Heh, me too :).  Probably one of the most run commands in my terminals
is `cd ~/tmp`.  However, I discard them always (keep them maybe for a
week, until it grows too much and `rm -rf *` does its job); organizing
that mess of 10-liner programs is too much for me :p.

> 
> Other people may have an easier time forming accurate models of
> programming systems in their heads, but for me the right approach
> appears to be radical skepticism combined with a record of findings
> (i.e., expanding or correcting the documentation where appropriate).

Well, once I write the test, it at least gets into my brain so I can
discard it.  Cache misses of that kind don't seem a big issue in my
head, it seems.  I have good video memory.  :-)

> 
>> Something I do is first look at the synopsis, have a quick look at the
>> description searching for one line that describes each argument, and
>> then look at the example program to guess myself about the function.
>> Only after that is when I try to read the entire page to know the
>> details.  But most of a function should be obvious already before
>> reading the description, or the design of the function would be
>> dubious.
> 
> I broadly agree.  This is one reason naming things well is important.

Guess what's been the discussion of the day?  Why do we find bool more
readable than uint8_t (with a comment saying /* 1 bit */) for boolean
variables, and true/false vs 1/0.  lol.

Cheers,
Alex

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5
Attachment:
OpenPGP_signature

Description: OpenPGP digital signature