On Mon, Jan 20, 2025 at 12:14:42PM +0100, Alejandro Colomar wrote: > Hi Florian, Jason, > > On Mon, Jan 20, 2025 at 09:20:27AM +0100, Florian Weimer wrote: > > Character sets used by glibc locales must be mostly ASCII-transparent. > > This includes the mapping of the null byte. It is possible to create > > locales that do not follow these rules, but they tend to introduce > > security vulnerabilities, particularly if shell metacharacters are > > mapped differently. > > Thanks! Then, Jason, please use terminated strings in the example, and > assume a glibc locale. OK. I’ll submit a new version of the patch that does that. > If one uses a locale that doesn't work like this, they'll have the call > fail because the converted null character won't fit, so the program > would still be safe. I disagree. I don’t think that the code would necessarily be safe if someone uses such a locale. Specifically, I think that the converted U+0000 null character would fit in the output buffer most of time. Imagine this scenario: 1. We use malloc() instead of calloc(). 2. The user uses a modified UTF-8 locale. Here’s what the example code would do in that scenario. First, it would calculate locale_pathname_size: size_t locale_pathname_size = len * MB_CUR_MAX + 1; There’s 8 characters in utf32_pathname, but lengths don’t include the final null terminator, so len is going to be 7. For modified UTF-8, MB_CUR_MAX would be 6. 7 * 6 + 1 = 43. We would then allocate 43 bytes: char *locale_pathname = malloc(locale_pathname_size); Here are our 43 bytes: UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU When I write a byte as being “UU” that means that the byte’s value is undefined. Next, we have iconv() convert the UTF-32 string to modified UTF-8. Here’s what our memory block will look like after iconv() has converted 7 out of the 8 characters in utf32_pathname: 65 78 61 6D 70 6C 65 UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU Now it’s time for iconv() to convert the final U+0000 null character. In modified UTF-8, U+0000 null is encoded as C0 80. iconv() will check to see if there’s enough room in outbuf for those two bytes. There is enough room for those two bytes, so iconv() will store those two bytes and finish without error: 65 78 61 6D 70 6C 65 C0 80 UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU UU In this scenario, iconv() finished successfully, but there aren’t any bytes that we know for a fact are null. The best we can hope for is that one of the undefined bytes just so happens to be null so that we don’t do an out-of-bounds read.