Re: [PATCH v5] man/man7/pathname.7: Add file documenting format of pathnames

Jason Yundt <jason@jasonyundt.email> · Mon, 20 Jan 2025 08:17:00 -0500

On Mon, Jan 20, 2025 at 12:14:42PM +0100, Alejandro Colomar wrote:
> Hi Florian, Jason,
> 
> On Mon, Jan 20, 2025 at 09:20:27AM +0100, Florian Weimer wrote:
> > Character sets used by glibc locales must be mostly ASCII-transparent.
> > This includes the mapping of the null byte.  It is possible to create
> > locales that do not follow these rules, but they tend to introduce
> > security vulnerabilities, particularly if shell metacharacters are
> > mapped differently.
> 
> Thanks!  Then, Jason, please use terminated strings in the example, and
> assume a glibc locale.

OK.  I’ll submit a new version of the patch that does that.

> If one uses a locale that doesn't work like this, they'll have the call
> fail because the converted null character won't fit, so the program
> would still be safe.

I disagree.  I don’t think that the code would necessarily be safe if
someone uses such a locale.  Specifically, I think that the converted
U+0000 null character would fit in the output buffer most of time.

Imagine this scenario:

1. We use malloc() instead of calloc().
2. The user uses a modified UTF-8 locale.

Here’s what the example code would do in that scenario.  First, it would
calculate locale_pathname_size:

	size_t locale_pathname_size = len * MB_CUR_MAX + 1;

There’s 8 characters in utf32_pathname, but lengths don’t include the
final null terminator, so len is going to be 7.  For modified UTF-8,
MB_CUR_MAX would be 6.  7 * 6 + 1 = 43.  We would then allocate 43
bytes:

	char *locale_pathname = malloc(locale_pathname_size);

Here are our 43 bytes:

UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU
UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU

When I write a byte as being “UU” that means that the byte’s value is
undefined.  Next, we have iconv() convert the UTF-32 string to modified
UTF-8.  Here’s what our memory block will look like after iconv() has
converted 7 out of the 8 characters in utf32_pathname:

65 78 61 6D 70 6C 65 UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU
UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU

Now it’s time for iconv() to convert the final U+0000 null character.
In modified UTF-8, U+0000 null is encoded as C0 80.  iconv() will check to
see if there’s enough room in outbuf for those two bytes.  There is
enough room for those two bytes, so iconv() will store those two bytes
and finish without error:

65 78 61 6D 70 6C 65 C0 80 UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU
UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU

In this scenario, iconv() finished successfully, but there aren’t any
bytes that we know for a fact are null.  The best we can hope for is
that one of the undefined bytes just so happens to be null so that we
don’t do an out-of-bounds read.