Hi Matthew, On Tue, Nov 07, 2023 at 09:12:37PM -0500, Matthew House wrote: > On Tue, Nov 7, 2023 at 8:21 AM Alejandro Colomar <alx@xxxxxxxxxx> wrote: > > On Tue, Nov 07, 2023 at 11:52:44AM +0000, Jonny Grant wrote: > > > We see things differently, I'm on the C standard side on this one. Would any information change your mind? > > > > It's difficult to say, but I doubt it. But let me ask you something: > > In what cases would you find strncpy(3) appropriate to use, and why? > > Maybe if I understand that it helps. > > > > Kind regards, > > Alex > > Man pages aren't read only by people writing new code, but also by people > reading and modifying existing code. And despite your preferences regarding > which functions ought to be used to produce strings, it's a widespread (and > correct) practice to produce a string from the character sequence created > by strncpy(3). There are two ways of doing this, either by setting the last > character of the destination buffer to null if you want to produce a > truncated string, or by testing the last character against zero if you want > to detect truncation and raise an error. It is not strncpy(3) who truncated, but the programmer by adding a NULL in buff[BUFSIZ - 1]. In the following snippet, strncpy(3) will not truncate: char cs[3]; strncpy(cs, "foo", 3); And yet your code doing if (cs[2] != '\0') { goto error; } would think it did. That's because you deformed strncpy(3) to implement a poor man's strlcpy(3). char cs[3]; strncpy(cs, "foo", 3); cs[2] = '\0'; // The truncation is here, not in strncpy(3). > I'm not aware of any alternative to a strncpy(3)-based snippet for > producing a possibly-truncated copy of a string, except for your preferred > strlcpy(3) or stpecpy(3), which aren't available to anyone without a The Linux kernel has strscpy(3), which is also good, but is not available to user space. > brand-new glibc (nor, by extension, any applications or libraries that want libbsd has provided strlcpy(3) since basically forever. It is a very portable library. You don't need a brand-new glibc for having strlcpy(3). <https://libbsd.freedesktop.org/wiki/> > to support people without a brand-new glibc, nor any libraries that want to > support other platforms like Windows with only ISO C and POSIX-ish If you program for Windows, it depends. If you have POSIX available, you may be able to port libbsd; I don't know. In any case, I don't case about Windows enough. You could always write your own string- copying function for Windows. > functions); snprintf(3), which has the insidious flaw of not supporting > more than INT_MAX characters on pain of UB, and also produces a warning if > the compiler notices the possible truncation; or strlen(3) + min() + > memcpy(3) + manually adding a null terminator, which is certainly more > explicit in its intent, and avoids strncpy(3)'s zero-filling behavior if > that poses a performance problem, but similarly opens up room for > off-by-one errors. More than the performance problem, I'm more worried about the maintainability of strncpy(3). When 20 years from now, a programmer reading a piece of code full of strncpy(3) wants to migrate to a sane function like strlcpy(3) or strcpy(3), the programmer needs to understand if the zeroing was purposeful or just accidental. Because by using strlcpy(3), it may start leaking some trailing data if the trailing of the buffer is meaningful to some program. > > For the sake of reference, I looked into a few big C and C++ projects to > see how often a strncpy(3)-based snippet was used to produce a truncated > copy. I found 18 instances in glibc 2.38, 2 in util-linux 2.39.2 (in spite > of its custom xstrncpy() function), 61 in GNU binutils 2.41, 43 in > GDB 13.2, 1 in LLVM 17.0.4, 7 in CPython 3.12.0, 99 in OpenJDK 22+22, > 10 in .NET Runtime 7.0.13, 3 in V8 12.1.82, and 86 in Firefox 120.0. (Note > that I haven't filtered out vendored dependencies, so there's a little bit > of double-counting.) It seems like most codebases that don't ban strncpy(3) > use a derived snippet somewhere or another. Also, I found 3 instances in > glibc 2.38 and 5 instances in Firefox 120.0 of detecting truncation by > checking the last character. I know. I've been rewriting the code handling strings in shadow-utils for the last year, and ther was a lot of it. I fixed several small bugs in the process, so I recommend avoiding it. > > So these two snippets really are widespread, especially among the long tail > of smaller C and C++ applications and libraries that don't perform enough > string manipulation that it warrants creating a custom set of more- > foolproof wrapper functions (at least, in the opinion of their authors). > Thus, since they're not going away, it would be useful for anyone reading > the code to understand the concept behind how these two snippets work, that > the only difference between the strncpy(3)'s special "character sequence" > and an ordinary C string is an additional null terminator at the end of the > destination buffer. This is part of string_copying(7): DESCRIPTION Terms (and abbreviations) string (str) is a sequence of zero or more non‐null characters followed by a null byte. character sequence is a sequence of zero or more non‐null characters. A program should never use a character sequence where a string is required. However, with appropriate care, a string can be used in the place of a character sequence. I think that is very explicit in the difference. strncpy(3) refers to that page for understanding the differences, so I think it is documented. strncpy(3): CAVEATS The name of these functions is confusing. These functions produce a null‐padded character sequence, not a string (see string_copying(7)). > > In other words, strncpy(3) doesn't create a truncated string, but it > creates something which can be easily turned into to a truncated string, > and that's its most relevant quality for most of its uses in existing code. > Further, apart from snprintf(3), there's no other portable way to produce a > truncated string without manual arithmetic. Thus, I'd also find it Portable is relative. With libbsd, you can port to most POSIX systems. Windows is another story. > reasonable to highlight precisely why strncpy(3)'s output isn't a string How about this?: diff --git a/man3/stpncpy.3 b/man3/stpncpy.3 index d4c2ce83d..c80c8b640 100644 --- a/man3/stpncpy.3 +++ b/man3/stpncpy.3 @@ -108,7 +108,10 @@ .SH HISTORY .SH CAVEATS The name of these functions is confusing. These functions produce a null-padded character sequence, -not a string (see +not a string. +While strings have a terminating NUL byte, +character sequences do not have any terminating byte +(see .BR string_copying (7)). .P It's impossible to distinguish truncation by the result of the call, > (viz., the lack of a null terminator), instead of trying to insist that its > output is worlds apart from anything string-related, especially given the > volume of existing correct code that belies that notion. It is not correct code. That code is doing extra work which confuses maintainers. It is a lot like writing dead code, since you're writing zeros that nobody is reading, which confuses maintainers. Also, I've seen a lot of off-by-one bugs in calls to strncpy(3), so no, it's not correct code. It's rather dangerous code that just happens to not be vulnerable most of the time. > > Or, to answer your question, "It's appropriate to keep using strncpy(3) in > existing code where it's currently used as part of creating a truncated > string, and it's not especially inappropriate to use strncpy(3) in new code > as part of creating a truncated string, if the code must support platforms > without strlcpy(3) or similar, and if the resulting snippets are few enough > and well-commented enough that they create less mental load than creating > and maintaining a custom helper function." strncpy(3) calls are never well documented. Do you add a comment in each such call saying "this zeroing is superfluous"? Probably not. > > (As an aside, I find the remark in the man page that "It's impossible to > distinguish truncation by the result of the call" extremely misleading at > best, since truncation can easily be distinguished by inspecting the last > output character.) Again, strncpy(3)'s truncation is impossible to detect. What you can detect is that your construct that resembles strlcpy(3) truncates, which is a different thing. Thanks, Alex > > Thank you, > Matthew House -- <https://www.alejandro-colomar.es/>
Attachment:
signature.asc
Description: PGP signature