Re: strncpy clarify result may not be null terminated

Jonny Grant <jg@xxxxxxxx> · Sat, 11 Nov 2023 20:55:41 +0000

On 10/11/2023 16:06, Matthew House wrote:
> On Thu, Nov 9, 2023 at 7:23 AM Alejandro Colomar <alx@xxxxxxxxxx> wrote:
>>> So one can interpret strncpy(3) as copying a prefix of a character sequence
>>> into a buffer (and zero-filling the remainder), in which case you're
>>> correct that truncation cannot be detected. But the function is fomally
>>> defined as copying a prefix of a string into a buffer (and zero-filling the
>>> remainder), in which case the string has been truncated if the buffer
>>> doesn't end in a null byte afterward. It's just that one may not care about
>>> the terminating null byte being truncated if the user of the result just
>>> wants the initial character sequence.
>>
>> Yes, with the ISO C definition of strncpy(3), you can detect truncation.
>> The problem is that while my definition of it is complete, the
>> definition by ISO C makes it an incomplete function (to complete its
>> functionallity in copying strings, you need to add an explicit '\0'
>> after the call).  So I prefer mine, and for self-consistency, it can't
>> report truncation.
> 
> Personally, I'm a pragmatist, and I like to see it as kind of a duality: it
> can be used as part of a routine that copies part of a string and reports
> truncation, and it can also be used as a complete routine that copies part
> of a character sequence but can't report truncation. That reflects how it's
> used in practice. And it would hardly be the first such duality in C,
> either, given things like the fundamental practice of manipulating
> arbitrary objects as if they're character arrays.
> 
> (Some of these other dualities are similarly infamous in their room for
> error, e.g., forgetting to multiply by the element size when calling
> malloc(3), which I have often been guilty of myself. And still, a worrying
> amount of code neglects to test for multiplication overflow when doing
> this, even when the length comes from an untrusted source. Yet somehow I
> haven't seen any calls for a mallocarray(3) function to replace it. Ditto
> with memset(3), which can and has caused actual hard-to-notice bugs due to
> the first few elements looking correct even if the provided length is too
> short.)
> 
> But you're entitled to your opinion on how it ought to be best represented
> in the man page, as long as the immediate shortcoming of the function w.r.t
> producing strings is made very clear, even to readers who aren't in the
> habit of contemplating formal definitions. I'm satisfied by your patch in
> that regard.
> 
>>> That's a nice library that I didn't know about! Unfortunately, I don't
>>> think it's a very viable option for the long tail of small libraries I've
>>> referred to, which generally don't have any sub-dependencies of their own,
>>> apart from those provided by the platform.
>>>
>>> Going from 0 to 2 dependencies (libbsd and libmd) requires invoking their
>>> configure scripts from whatever build system you're using (in such a way
>>> that libbsd can locate libmd), ensuring they're safe for cross-compilation
>>> if that's a goal, ensuring you bundle them in a way that respects their
>>> license terms, and ensuring that any user of your library links to the two
>>> dependencies and doesn't duplicate them. At that point, rolling your own
>>> strlcpy(3) equivalent definitely sounds like less mental load, at least to
>>> me.
>>
>> Yes, if you had 0 deps, it might be simpler to add your implementation.
>> Although it's a tricky function to implement, so I'd be careful.  If you
>> need to roll your own, I would go for a simpler function; maybe a
>> wrapper over strlen(3)+strcpy(3).
> 
> Such a wrapper would indeed be useful for detecting truncation, but a full
> strlcpy(3) equivalent would be necessary for permitting the truncation and
> continuing, which is the behavior of the majority of existing strncpy(3)-
> based code.
> 
> I don't deny that this truncation behavior is often done dubiously and
> rarely receives enough scrutiny, but a significant chunk of the uses really
> are just building an informative string which won't cause any harm if
> truncated, and installing additional control flow to handle truncation
> errors in places where there currently isn't any can introduce its own
> bugs.

Truncation seems risky, I can't think of many nice use-cases of truncation. Say it's a file path, truncation means the file path isn't accurate any more. Maybe a song title for a music player could be ok truncated, so just display first x characters of the song title etc. Doesn't feel great though. Maybe strings beyond an expected size, as a safety check. So a song title longer than the 255 bytes that the format allows could be truncated. (probably a missing NUL in the file, or a corrupt file)

>>> I didn't see this as an issue in practice when I was reviewing all those
>>> existing usages of strncpy(3). The vast majority were used in the midst of
>>> simple string manipulation, where the destination buffer starts as
>>> uninitialized or zeroed out, and ultimately gets passed into a user
>>> expecting an ordinary null-terminated string.
>>>
>>> (One exception was a few functions that used strncpy(dst, "", len) to zero
>>
>> Holy crap!  Didn't these programmers know bzero(3) or memset(3)?  :D

Perhaps that strncpy might get optimized out, if the memory modified isn't read again after memset(). So may need explicit_memset() for this situation.

>>> out the buffer, which is thankfully pretty obvious. Another exception was
>>> the functions that actually used strncpy(3) to produce a null-padded
>>> character sequence, e.g., when writing a value into a section of a binary.
>>> But in general, I found that it's usually not difficult to tell when a
>>> usage is being clever enough that the null padding might be significant.)
>>>
>>> In fact, the greater confusion came from the surprisingly common practice
>>> of using strncpy(3) like it's memcpy(3), by giving it the known length of
>>
>> It gets better!  :D
> 
> In all these cases, I think the function naming really is having somewhat
> of a psychological effect: the authors are wrangling with strthis(3) and
> strthat(3) for dozens of lines, so they'd find it scary to start mixing it
> up with mem*(3) functions ("I'm working with C strings, not with byte
> arrays!"), or perhaps they don't even consider it. They'd rather remain
> with strncpy(3), even when it means they have to manually append it with a
> null terminator or another string. But I'm no psychoanalyst, so take that
> with a big grain of salt.
> 
> (Meanwhile, in my own code, I try to work with pointer-and-length arrays
> whenever possible instead of fooling around with null terminators and all
> their off-by-one fun, so I've become leery of using any str*(3) functions
> apart from strlen(3) and strnlen(3).)

Do you mean passing a size_t around for the length of the src string? That saves needing to read memory counting bytes, which is a performance boost on big strings. Accessing memory to read or write unnecessarily is a performance drag.

I saw you mention off by 1 errors, I recall seeing some old code bases decades ago where they used to allocate an extra 2 bytes, just to avoid crashes in their buggy code, pretty bad stuff.

Kind regards
Jonny