The goal of this new manual page is to help people create programs that do the right thing even in the face of unusual paths. The information that I used to create this new manual page came from these sources: • <https://unix.stackexchange.com/a/39179/316181> • <https://sourceware.org/pipermail/libc-help/2024-August/006737.html> • <https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/ext4/ext4.h?h=v6.12.9#n2288> • <man:unix(7)> • <https://unix.stackexchange.com/q/92426/316181> Signed-off-by: Jason Yundt <jason@jasonyundt.email> --- Here’s what I changed from the previous version: • I removed the second iconv() call. • I made utf32_pathname const. I think that that was the only one that could be made const, but correct me if I’m wrong. • I changed the order of the variable declarations. I think that they’re in the correct order now, but correct me if I’m wrong. • I removed the curly brackets from all of the if statements. • I renamed inbytesleft to inbytes and outbytesleft to outbytes. • I replaced the \\ with \[rs]. man/man7/pathname.7 | 160 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 160 insertions(+) create mode 100644 man/man7/pathname.7 diff --git a/man/man7/pathname.7 b/man/man7/pathname.7 new file mode 100644 index 000000000..5864f230d --- /dev/null +++ b/man/man7/pathname.7 @@ -0,0 +1,160 @@ +.\" Copyright (C) 2025 Jason Yundt (jason@jasonyundt.email) +.\" +.\" SPDX-License-Identifier: Linux-man-pages-copyleft +.\" +.TH pathname 7 (date) "Linux man-pages (unreleased)" +.SH NAME +pathname, +filename +\- +how pathnames are encoded and interpreted +.SH DESCRIPTION +Some system calls allow you to pass a pathname as a parameter. +When writing code that deals with pathnames, +there are kernel-space requirements that you must comply with, +and user-space requirements that you should comply with. +.P +The kernel stores pathnames as null-terminated byte sequences. +The kernel has a few general rules that apply to all pathnames: +.IP \[bu] 3 +The last byte in the sequence needs to be a null byte. +.IP \[bu] +Any other bytes in the sequence need to be non-null bytes. +.IP \[bu] +A 0x2F byte is always interpreted as a directory separator (/) +and cannot be part of a filename. +.IP \[bu] +A pathname can be at most PATH_MAX bytes long. +PATH_MAX is defined in +.BR limits.h (0p)\ +\. +A pathname that’s longer than PATH_MAX bytes +can be split into multiple smaller pathnames and opened piecewise using +.BR openat (2). +.IP \[bu] +A filename can be at most a certain number of bytes long. +The number is filesystem-specific. +You can get the filename length limit for a currently mounted filesystem +by passing _PC_NAME_MAX to +.BR fpathconf (3)\ +\. +For maximum portability, programs should be able to handle filenames +that are as long as the relevant filesystems will allow. +For maximum portability, programs and users should limit the length +of their own pathnames to NAME_MAX bytes. +NAME_MAX is defined in +.BR limits.h (0p)\ +\. +.P +The kernel also has some rules that only apply in certain situations. +Here are some examples: +.IP \[bu] 3 +Filenames on the ext4 filesystem can be at most 30 bytes long. +.IP \[bu] +Filenames on the vfat filesystem cannot a +0x22, 0x2A, 0x3A, 0x3C, 0x3E, 0x3F, 0x5C or 0x7C byte +(", *, :, <, >, ?, \ or | in ASCII) +unless the filesystem was mounted with iocharset set to something unusual. +.IP \[bu] +A UNIX domain socket’s sun_path can be at most 108 bytes long (see +.BR unix (7) +for details). +.P +User space treats pathnames differently. +User space applications typically expect pathnames to use +a consistent character encoding. +For maximum interoperability, programs should use +.BR nl_langinfo (3) +to determine the current locale’s codeset. +Paths should be encoded and decoded using the current locale’s codeset +in order to help prevent mojibake. +For maximum interoperability, +programs and users should also limit +the characters that they use for their own pathnames to characters in +.UR https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_265 +the POSIX Portable Filename Character Set +.UE . +.SH EXAMPLES +The following program demonstrates +how to ensure that a pathname uses the proper encoding. +The program starts with a UTF-32 encoded pathname. +It then calls +.BR nl_langinfo (3) +in order to determine what the current locale’s codeset is. +After that, it uses +.BR iconv (3) +to convert the UTF-32 encoded pathname into a locale codeset encoded pathname. +Finally, the program uses the locale codeset encoded pathname to create +a file that contains the message “Hello, world!” +.SS Program source +.\" SRC BEGIN (pathname_encoding_example.c) +.EX +#include <err.h> +#include <iconv.h> +#include <langinfo.h> +#include <locale.h> +#include <stdio.h> +#include <stdlib.h> +#include <uchar.h> +\& +#define NELEMS(a) (sizeof(a) / sizeof(a[0])) +\& +int +main(void) +{ + char *inbuf; + char *locale_pathname; + char *outbuf; + FILE *fp; + size_t iconv_result; + size_t inbytes; + size_t outbytes; + size_t size; + iconv_t cd; + const char32_t utf32_pathname[] = U"example"; +\& + if (setlocale(LC_ALL, "") == NULL) + err(EXIT_FAILURE, "setlocale"); +\& + size = NELEMS(utf32_pathname) * MB_CUR_MAX; + locale_pathname = malloc(size); + if (locale_pathname == NULL) + err(EXIT_FAILURE, "malloc"); +\& + cd = iconv_open(nl_langinfo(CODESET), "UTF\-32"); + if (cd == (iconv_t) \- 1) + err(EXIT_FAILURE, "iconv_open"); +\& + inbuf = (char *) utf32_pathname; + inbytes = sizeof utf32_pathname; + outbuf = locale_pathname; + outbytes = size; + iconv_result = + iconv(cd, &inbuf, &inbytes, &outbuf, &outbytes); + if (iconv_result == \-1) + err(EXIT_FAILURE, "iconv"); +\& + if (iconv_result == \-1) + err(EXIT_FAILURE, "iconv"); +\& + if (iconv_close(cd) == \-1) + err(EXIT_FAILURE, "iconv_close"); +\& + fp = fopen(locale_pathname, "w"); + fputs("Hello, world!\[rs]n", fp); + if (fclose(fp) == EOF) + err(EXIT_FAILURE, "fclose"); +\& + free(locale_pathname); + exit(EXIT_SUCCESS); +} +.EE +.\" SRC END +.SH SEE ALSO +.BR limits.h (0p), +.BR open (2), +.BR fpathconf (3), +.BR iconv (3), +.BR nl_langinfo (3), +.BR path_resolution (7), +.BR mount (8) -- 2.47.1