Re: [PATCH v3] filename.7: new manual page

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, Florian!

On 10/19/21 1:05 PM, Thaddeus H. Black wrote:
On Tue, Oct 19, 2021 at 10:54:11AM +0200, Florian Weimer wrote:
Maybe add: “A pathname contains zero or more filenames.”

Okay.

What does this mean?  I think only byte 0x2f is reserved.  The UTF-8
comment is misleading.  A historic/overlong encoding of / in multiple
UTF-8 bytes is *not* reserved.

I had not known that UTF-8 had an alternate encoding for any ASCII
character.  Does it indeed have an alternate encoding?  If so, where
can I learn more?

The new filename(7) manual page wishes to be correct but, otherwise,
would like to inflict upon the reader as little difficult technical
prose as it can.  The page wants to remain readable.  In this light, can
you advise me how the page should speak to your point?

This conflicts with the presentation of / as a separator in pathnames, I
think: The pathname "/usr/" contains two empty filenames.

I had not thought of that.  Good point.

Thus, the empty filename is not forbidden but rather is reserved.

Not according to POSIX:

<https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_271>
[
 3.271 Pathname

A string that is used to identify a file. In the context of POSIX.1-2017, a pathname may be limited to {PATH_MAX} bytes, including the terminating null byte. It has optional beginning <slash> characters, followed by zero or more filenames separated by <slash> characters. A pathname can optionally contain one or more trailing <slash> characters. Multiple successive <slash> characters are considered to be the same as one <slash>, except for the case of exactly two leading <slash> characters.

Note:
If a pathname consists of only bytes corresponding to characters from the portable filename character set (see Portable Filename Character Set), <slash> characters, and a single terminating <NUL> character, the pathname will be usable as a character string in all supported locales; otherwise, the pathname might only be a string (rather than a character string). Additionally, since the single-byte encoding of the <slash> character is required to be the same across all locales and to not occur within a multi-byte character, references to a <slash> character within a pathname are well-defined even when the pathname is not a character string. However, this property does not necessarily hold for the remaining characters within the portable filename character set.

    Pathname Resolution is defined in detail in Pathname Resolution.

3.272 Pathname Component

See Filename in Filename.
]

[
 3.170 Filename

A sequence of bytes consisting of 1 to {NAME_MAX} bytes used to name a file. The bytes composing the name shall not contain the <NUL> or <slash> characters. In the context of a pathname, each filename shall be followed by a <slash> or a <NUL> character; elsewhere, a filename followed by a <NUL> character forms a string (but not necessarily a character string). The filenames dot and dot-dot have special meaning. A filename is sometimes referred to as a "pathname component". See also Pathname.

Note:
    Pathname Resolution is defined in detail in Pathname Resolution .
]

According to the above, there's no optionally-empty always-existing initial filename in a pathname. It's the initial slash that is optional, and the first filename is the one that goes after the first optional slash. That's especially true in some systems such as Cygwin, which has a special meaning for an initial '//'.

Multiple successive non-initial slashes also don't have empty filenames between them, but are a single token, equivalent to a single slash, acording to POSIX.

All of the above, AFAIK :)


+No filename may exceed\~255 bytes in length,
+or\~256 bytes after counting the terminating null byte.

This is not correct for Linux.  Despite the definition of NAME_MAX,
filenames can be longer than 255 bytes.  NTFS and CIFS have a limit of
255 UTF-16 characters, which translates to about 768 bytes in the UTF-8
encoding used by Linux.

I see.

Your feedback is helpful and appreciated (especially since you are the
first Fedora-class user to return a review).


Thank you both!

Alex

--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/



[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux