Re: [BUG 212887] [PATCH v2] getopt.3: Clarify behaviour

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi James,

See some more comments below.

Thanks,

Alex

On 5/1/21 9:41 PM, Alejandro Colomar wrote:
> From: "James O. D. Hunt" <jamesodhunt@xxxxxxxxx>
> 
> Improved the `getopt(3)` man page in the following ways:
> 
> 1) Defined the existing term "legitimate option character".
> 2) Added an additional NOTE stressing that arguments are parsed in strict
>    order and the implications of this when numeric options are utilised.
> 3) Added a new WARNINGS section that alerts the reader to the fact they
>    should:
>    - Validate all option argument.
>    - Take care if mixing numeric options and arguments accepting numeric
>      values.

Could you please separate this into 2 patches?  1 & 2 seem to be very
related, but 3 is quite different.

> 
> Signed-off-by: James O. D. Hunt <jamesodhunt@xxxxxxxxx>
> Bugzilla: <https://bugzilla.kernel.org/show_bug.cgi?id=212887>
> ---
> 
> Forward patch v2 from bugzilla to linux-man@.
> 
>  man3/getopt.3 | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 61 insertions(+)
> 
> diff --git a/man3/getopt.3 b/man3/getopt.3
> index 921e747f8..810298505 100644
> --- a/man3/getopt.3
> +++ b/man3/getopt.3
> @@ -126,6 +126,11 @@ Then \fIoptind\fP is the index in \fIargv\fP of the first
>  .PP
>  .I optstring
>  is a string containing the legitimate option characters.
> +A legitimate option character is any visible one byte
> +.BR ascii (7)
> +character (for which
> +.BR isgraph (3)
> +would return nonzero) that is not dash (\(aq\-\(aq) or colon (\(aq:\(aq).
>  If such a
>  character is followed by a colon, the option requires an argument, so
>  .BR getopt ()
> @@ -402,6 +407,23 @@ routine that rechecks
>  .B POSIXLY_CORRECT
>  and checks for GNU extensions in
>  .IR optstring .)
> +.PP

Some more "semantic newline" cuts (//):

> +Command-line arguments are parsed in strict order // meaning that an option
> +requiring an argument will consume the next argument, // regardless of whether
> +that argument is the correctly specified option argument // or simply the next
> +option // (in the scenario the user mis-specifies the command line).
> +For example, if
> +.IR optstring
> +is specified as "1n:"
> +and the user incorrectly specifies the command line arguments as
> +\(aqprog\ \-n\ \-1\(aq, the

Please replace the quotes by italics (.IR or .I) (and possibly
non-breaking spaces).

$ man 7 man-pages | sed -n '/Complete commands/,+11p';
       Complete commands should, if long, be written as  an  in‐
       dented  line  on  their own, with a blank line before and
       after the command, for example

           man 7 man-pages

       If the command is short, then it can be  included  inline
       in  the  text,  in italic format, for example, man 7 man‐
       pages.  In this case, it may be worth  using  nonbreaking
       spaces ("\ ") at suitable places in the command.  Command
       options should be written in italics (e.g., -l).

> +.I \-n
> +option will be given the
> +.B optarg
> +value \(aq\-1\(aq, and the

Given that in the case above -1 is a C string, double quotes would
probably be more appropriate.

> +.I \-1
> +option will be considered to have not been specified.
> +.PP
>  .SH EXAMPLES
>  .SS getopt()
>  The following trivial example program uses
> @@ -542,6 +564,45 @@ main(int argc, char **argv)
>      exit(EXIT_SUCCESS);
>  }
>  .EE
> +.PP
> +.SH WARNINGS

This should probably be a subsection in NOTES.

$ man 7 man-pages | sed -n '/^ *Where.*traditional/,/^$/p';
       Where  a  traditional heading would apply, please use it;
       this kind of consistency can make the information  easier
       to  understand.   If  you  must,  you can create your own
       headings if they make things easier to  understand  (this
       can  be especially useful for pages in Sections 4 and 5).
       However, before doing this, consider  whether  you  could
       use the traditional headings, with some subsections (.SS)
       within those sections.

> +Since
> +.BR getopt ()

Some more "semantic newline" cuts (//):

> +allows users to provide values to the program, // every care should be taken to
> +validate // every option value specified by the user calling the program.
> +.BR getopt ()
> +itself provides no validation so // the programmer should perform boundary value
> +checks on
> +.I every
> +argument to minimise the risk of bad input data being accepted by the program.
> +String values should be checked to // ensure they are not empty (unless
> +permitted), // sanitized appropriately and // that internal buffers used to store

Review wording (s/that internal/check that internal/?)

> +the string values returned in
> +.I optarg
> +are large enough to hold pathologically long values.

I'm not sure if this is extending these notes too much.  I see it
obvious that any user input, especially strings, should be checked for
every corner case in paranoid mode.  But I checked scanf(3) and didn't
see any NOTES about that.

> +Numeric values should be verified to ensure they are within the expected
> +permissible range of values.
> +.PP
> +Further, since
> +.BR getopt ()
> +can handle numeric options (such as \(aq\-1\(aq or \(aq\-2\ foo\(aq), care should
> +be taken when writing  a program that accepts both a numeric flag option and
> +an option accepting a numeric argument.
> +Specifically, the program should sanity check the numeric
> +.I optarg
> +value carefully to protect against the case where a user mis-specifies the
> +command line which chould result in a numeric option flag being specified as
> +the
> +.I optarg
> +value for the numeric option by mistake.
> +For example, if
> +.IR optstring
> +is specified as "1n:" and the \(aqn\(aq option accepts a numeric value, if the
> +command line is specified accidentally as \(aqprog\ \-n\ \-1\(aq, care needs to
> +be taken to ensure the program does not try to convert the \(aq\-1\(aq passed
> +to the \(aqn\(aq option into an unsigned numeric value since that would result
> +in it being set to the largest possible integer value for the type used to
> +encode it.

I don't think we should warn about this.  If the user inputs a wrong
command line, he can only expect undefined behavior.  For the program,
as long as it doesn't have any security problems, it doesn't need to
care if the user doesn't provide a valid input.  Normal checks should be
done.

>  .SH SEE ALSO
>  .BR getopt (1),
>  .BR getsubopt (3)
> 

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/



[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux