On 10/11/2023 16:06, Matthew House wrote: > On Thu, Nov 9, 2023 at 7:23 AM Alejandro Colomar <alx@xxxxxxxxxx> wrote: >>> So one can interpret strncpy(3) as copying a prefix of a character sequence >>> into a buffer (and zero-filling the remainder), in which case you're >>> correct that truncation cannot be detected. But the function is fomally >>> defined as copying a prefix of a string into a buffer (and zero-filling the >>> remainder), in which case the string has been truncated if the buffer >>> doesn't end in a null byte afterward. It's just that one may not care about >>> the terminating null byte being truncated if the user of the result just >>> wants the initial character sequence. >> >> Yes, with the ISO C definition of strncpy(3), you can detect truncation. >> The problem is that while my definition of it is complete, the >> definition by ISO C makes it an incomplete function (to complete its >> functionallity in copying strings, you need to add an explicit '\0' >> after the call). So I prefer mine, and for self-consistency, it can't >> report truncation. > > Personally, I'm a pragmatist, and I like to see it as kind of a duality: it > can be used as part of a routine that copies part of a string and reports > truncation, and it can also be used as a complete routine that copies part > of a character sequence but can't report truncation. That reflects how it's > used in practice. And it would hardly be the first such duality in C, > either, given things like the fundamental practice of manipulating > arbitrary objects as if they're character arrays. > > (Some of these other dualities are similarly infamous in their room for > error, e.g., forgetting to multiply by the element size when calling > malloc(3), which I have often been guilty of myself. And still, a worrying > amount of code neglects to test for multiplication overflow when doing > this, even when the length comes from an untrusted source. Yet somehow I > haven't seen any calls for a mallocarray(3) function to replace it. Ditto > with memset(3), which can and has caused actual hard-to-notice bugs due to > the first few elements looking correct even if the provided length is too > short.) > > But you're entitled to your opinion on how it ought to be best represented > in the man page, as long as the immediate shortcoming of the function w.r.t > producing strings is made very clear, even to readers who aren't in the > habit of contemplating formal definitions. I'm satisfied by your patch in > that regard. > >>> That's a nice library that I didn't know about! Unfortunately, I don't >>> think it's a very viable option for the long tail of small libraries I've >>> referred to, which generally don't have any sub-dependencies of their own, >>> apart from those provided by the platform. >>> >>> Going from 0 to 2 dependencies (libbsd and libmd) requires invoking their >>> configure scripts from whatever build system you're using (in such a way >>> that libbsd can locate libmd), ensuring they're safe for cross-compilation >>> if that's a goal, ensuring you bundle them in a way that respects their >>> license terms, and ensuring that any user of your library links to the two >>> dependencies and doesn't duplicate them. At that point, rolling your own >>> strlcpy(3) equivalent definitely sounds like less mental load, at least to >>> me. >> >> Yes, if you had 0 deps, it might be simpler to add your implementation. >> Although it's a tricky function to implement, so I'd be careful. If you >> need to roll your own, I would go for a simpler function; maybe a >> wrapper over strlen(3)+strcpy(3). > > Such a wrapper would indeed be useful for detecting truncation, but a full > strlcpy(3) equivalent would be necessary for permitting the truncation and > continuing, which is the behavior of the majority of existing strncpy(3)- > based code. > > I don't deny that this truncation behavior is often done dubiously and > rarely receives enough scrutiny, but a significant chunk of the uses really > are just building an informative string which won't cause any harm if > truncated, and installing additional control flow to handle truncation > errors in places where there currently isn't any can introduce its own > bugs. Truncation seems risky, I can't think of many nice use-cases of truncation. Say it's a file path, truncation means the file path isn't accurate any more. Maybe a song title for a music player could be ok truncated, so just display first x characters of the song title etc. Doesn't feel great though. Maybe strings beyond an expected size, as a safety check. So a song title longer than the 255 bytes that the format allows could be truncated. (probably a missing NUL in the file, or a corrupt file) >>> I didn't see this as an issue in practice when I was reviewing all those >>> existing usages of strncpy(3). The vast majority were used in the midst of >>> simple string manipulation, where the destination buffer starts as >>> uninitialized or zeroed out, and ultimately gets passed into a user >>> expecting an ordinary null-terminated string. >>> >>> (One exception was a few functions that used strncpy(dst, "", len) to zero >> >> Holy crap! Didn't these programmers know bzero(3) or memset(3)? :D Perhaps that strncpy might get optimized out, if the memory modified isn't read again after memset(). So may need explicit_memset() for this situation. >>> out the buffer, which is thankfully pretty obvious. Another exception was >>> the functions that actually used strncpy(3) to produce a null-padded >>> character sequence, e.g., when writing a value into a section of a binary. >>> But in general, I found that it's usually not difficult to tell when a >>> usage is being clever enough that the null padding might be significant.) >>> >>> In fact, the greater confusion came from the surprisingly common practice >>> of using strncpy(3) like it's memcpy(3), by giving it the known length of >> >> It gets better! :D > > In all these cases, I think the function naming really is having somewhat > of a psychological effect: the authors are wrangling with strthis(3) and > strthat(3) for dozens of lines, so they'd find it scary to start mixing it > up with mem*(3) functions ("I'm working with C strings, not with byte > arrays!"), or perhaps they don't even consider it. They'd rather remain > with strncpy(3), even when it means they have to manually append it with a > null terminator or another string. But I'm no psychoanalyst, so take that > with a big grain of salt. > > (Meanwhile, in my own code, I try to work with pointer-and-length arrays > whenever possible instead of fooling around with null terminators and all > their off-by-one fun, so I've become leery of using any str*(3) functions > apart from strlen(3) and strnlen(3).) Do you mean passing a size_t around for the length of the src string? That saves needing to read memory counting bytes, which is a performance boost on big strings. Accessing memory to read or write unnecessarily is a performance drag. I saw you mention off by 1 errors, I recall seeing some old code bases decades ago where they used to allocate an extra 2 bytes, just to avoid crashes in their buggy code, pretty bad stuff. Kind regards Jonny