Re: [PATCH v10] man/man7/pathname.7: Add file documenting format of pathnames

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jason,

On Tue, Jan 21, 2025 at 08:35:20AM -0500, Jason Yundt wrote:
> The goal of this new manual page is to help people create programs that
> do the right thing even in the face of unusual paths.  The information
> that I used to create this new manual page came from these sources:
> 
> • <https://unix.stackexchange.com/a/39179/316181>
> • <https://sourceware.org/pipermail/libc-help/2024-August/006737.html>
> • <https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/ext4/ext4.h?h=v6.12.9#n2288>
> • <man:unix(7)>
> • <https://unix.stackexchange.com/q/92426/316181>
> 
> Signed-off-by: Jason Yundt <jason@jasonyundt.email>

Thanks!  I've applied the patch, with some tweaks:
<https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=5e0b1cb79b88d3a78f60bf85bfd3a76df7c10307>

Feel free to send further patches.


Have a lovely night!
Alex

> ---
> Here’s what I changed from the previous version:
> 
> • I renamed inbuf to in and outbuf to out.
> • I removed the iconv_result variable.
> • I aligned and merged the variable declarations as requested.
> • I added parentheses to my use of sizeof.
> • I removed the leftover if statement.
> • I removed some unintentional spaces.
> 
>  man/man7/pathname.7 | 152 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 152 insertions(+)
>  create mode 100644 man/man7/pathname.7
> 
> diff --git a/man/man7/pathname.7 b/man/man7/pathname.7
> new file mode 100644
> index 000000000..96e0009e1
> --- /dev/null
> +++ b/man/man7/pathname.7
> @@ -0,0 +1,152 @@
> +.\" Copyright (C) 2025 Jason Yundt (jason@jasonyundt.email)
> +.\"
> +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> +.\"
> +.TH pathname 7 (date) "Linux man-pages (unreleased)"
> +.SH NAME
> +pathname,
> +filename
> +\-
> +how pathnames are encoded and interpreted
> +.SH DESCRIPTION
> +Some system calls allow you to pass a pathname as a parameter.
> +When writing code that deals with pathnames,
> +there are kernel-space requirements that you must comply with,
> +and user-space requirements that you should comply with.
> +.P
> +The kernel stores pathnames as null-terminated byte sequences.
> +The kernel has a few general rules that apply to all pathnames:
> +.IP \[bu] 3
> +The last byte in the sequence needs to be a null byte.
> +.IP \[bu]
> +Any other bytes in the sequence need to be non-null bytes.
> +.IP \[bu]
> +A 0x2F byte is always interpreted as a directory separator (/)
> +and cannot be part of a filename.
> +.IP \[bu]
> +A pathname can be at most PATH_MAX bytes long.
> +PATH_MAX is defined in
> +.BR limits.h (0p)\
> +\.
> +A pathname that’s longer than PATH_MAX bytes
> +can be split into multiple smaller pathnames and opened piecewise using
> +.BR openat (2).
> +.IP \[bu]
> +A filename can be at most a certain number of bytes long.
> +The number is filesystem-specific.
> +You can get the filename length limit for a currently mounted filesystem
> +by passing _PC_NAME_MAX to
> +.BR fpathconf (3)\
> +\.
> +For maximum portability, programs should be able to handle filenames
> +that are as long as the relevant filesystems will allow.
> +For maximum portability, programs and users should limit the length
> +of their own pathnames to NAME_MAX bytes.
> +NAME_MAX is defined in
> +.BR limits.h (0p)\
> +\.
> +.P
> +The kernel also has some rules that only apply in certain situations.
> +Here are some examples:
> +.IP \[bu] 3
> +Filenames on the ext4 filesystem can be at most 30 bytes long.
> +.IP \[bu]
> +Filenames on the vfat filesystem cannot a
> +0x22, 0x2A, 0x3A, 0x3C, 0x3E, 0x3F, 0x5C or 0x7C byte
> +(", *, :, <, >, ?, \ or | in ASCII)
> +unless the filesystem was mounted with iocharset set to something unusual.
> +.IP \[bu]
> +A UNIX domain socket’s sun_path can be at most 108 bytes long (see
> +.BR unix (7)
> +for details).
> +.P
> +User space treats pathnames differently.
> +User space applications typically expect pathnames to use
> +a consistent character encoding.
> +For maximum interoperability, programs should use
> +.BR nl_langinfo (3)
> +to determine the current locale’s codeset.
> +Paths should be encoded and decoded using the current locale’s codeset
> +in order to help prevent mojibake.
> +For maximum interoperability,
> +programs and users should also limit
> +the characters that they use for their own pathnames to characters in
> +.UR https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_265
> +the POSIX Portable Filename Character Set
> +.UE .
> +.SH EXAMPLES
> +The following program demonstrates
> +how to ensure that a pathname uses the proper encoding.
> +The program starts with a UTF-32 encoded pathname.
> +It then calls
> +.BR nl_langinfo (3)
> +in order to determine what the current locale’s codeset is.
> +After that, it uses
> +.BR iconv (3)
> +to convert the UTF-32 encoded pathname into a locale codeset encoded pathname.
> +Finally, the program uses the locale codeset encoded pathname to create
> +a file that contains the message “Hello, world!”
> +.SS Program source
> +.\" SRC BEGIN (pathname_encoding_example.c)
> +.EX
> +#include <err.h>
> +#include <iconv.h>
> +#include <langinfo.h>
> +#include <locale.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <uchar.h>
> +\&
> +#define NELEMS(a)  (sizeof(a) / sizeof(a[0]))
> +\&
> +int
> +main(void)
> +{
> +    char     *locale_pathname;
> +    char     *in, *out;
> +    FILE     *fp;
> +    size_t   size;
> +    size_t   inbytes, outbytes;
> +    iconv_t  cd;
> +    const char32_t utf32_pathname[] = U"example";
> +\&
> +    if (setlocale(LC_ALL, "") == NULL)
> +        err(EXIT_FAILURE, "setlocale");
> +\&
> +    size = NELEMS(utf32_pathname) * MB_CUR_MAX;
> +    locale_pathname = malloc(size);
> +    if (locale_pathname == NULL)
> +      err(EXIT_FAILURE, "malloc");
> +\&
> +    cd = iconv_open(nl_langinfo(CODESET), "UTF\-32");
> +    if (cd == (iconv_t)\-1)
> +        err(EXIT_FAILURE, "iconv_open");
> +\&
> +    in = (char *) utf32_pathname;
> +    inbytes = sizeof(utf32_pathname);
> +    out = locale_pathname;
> +    outbytes = size;
> +    if (iconv(cd, &in, &inbytes, &out, &outbytes) == \-1)
> +        err(EXIT_FAILURE, "iconv");
> +\&
> +    if (iconv_close(cd) == \-1)
> +        err(EXIT_FAILURE, "iconv_close");
> +\&
> +    fp = fopen(locale_pathname, "w");
> +    fputs("Hello, world!\[rs]n", fp);
> +    if (fclose(fp) == EOF)
> +        err(EXIT_FAILURE, "fclose");
> +\&
> +    free(locale_pathname);
> +    exit(EXIT_SUCCESS);
> +}
> +.EE
> +.\" SRC END
> +.SH SEE ALSO
> +.BR limits.h (0p),
> +.BR open (2),
> +.BR fpathconf (3),
> +.BR iconv (3),
> +.BR nl_langinfo (3),
> +.BR path_resolution (7),
> +.BR mount (8)
> -- 
> 2.47.1
> 

-- 
<https://www.alejandro-colomar.es/>

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux