Re: strncpy clarify result may not be null terminated

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Matthew,

On Tue, Nov 07, 2023 at 09:12:37PM -0500, Matthew House wrote:
> On Tue, Nov 7, 2023 at 8:21 AM Alejandro Colomar <alx@xxxxxxxxxx> wrote:
> > On Tue, Nov 07, 2023 at 11:52:44AM +0000, Jonny Grant wrote:
> > > We see things differently, I'm on the C standard side on this one. Would any information change your mind?
> >
> > It's difficult to say, but I doubt it.  But let me ask you something:
> > In what cases would you find strncpy(3) appropriate to use, and why?
> > Maybe if I understand that it helps.
> >
> > Kind regards,
> > Alex
> 
> Man pages aren't read only by people writing new code, but also by people
> reading and modifying existing code. And despite your preferences regarding
> which functions ought to be used to produce strings, it's a widespread (and
> correct) practice to produce a string from the character sequence created
> by strncpy(3). There are two ways of doing this, either by setting the last
> character of the destination buffer to null if you want to produce a
> truncated string, or by testing the last character against zero if you want
> to detect truncation and raise an error.

It is not strncpy(3) who truncated, but the programmer by adding a NULL
in buff[BUFSIZ - 1].  In the following snippet, strncpy(3) will not
truncate:

	char cs[3];

	strncpy(cs, "foo", 3);

And yet your code doing if (cs[2] != '\0') { goto error; } would think
it did.  That's because you deformed strncpy(3) to implement a poor
man's strlcpy(3).  


	char cs[3];

	strncpy(cs, "foo", 3);
	cs[2] = '\0';  // The truncation is here, not in strncpy(3).

> I'm not aware of any alternative to a strncpy(3)-based snippet for
> producing a possibly-truncated copy of a string, except for your preferred
> strlcpy(3) or stpecpy(3), which aren't available to anyone without a

The Linux kernel has strscpy(3), which is also good, but is not
available to user space.

> brand-new glibc (nor, by extension, any applications or libraries that want

libbsd has provided strlcpy(3) since basically forever.  It is a very
portable library.  You don't need a brand-new glibc for having
strlcpy(3).

<https://libbsd.freedesktop.org/wiki/>

> to support people without a brand-new glibc, nor any libraries that want to
> support other platforms like Windows with only ISO C and POSIX-ish

If you program for Windows, it depends.  If you have POSIX available,
you may be able to port libbsd; I don't know.  In any case, I don't
case about Windows enough.  You could always write your own string-
copying function for Windows.

> functions); snprintf(3), which has the insidious flaw of not supporting
> more than INT_MAX characters on pain of UB, and also produces a warning if
> the compiler notices the possible truncation; or strlen(3) + min() +
> memcpy(3) + manually adding a null terminator, which is certainly more
> explicit in its intent, and avoids strncpy(3)'s zero-filling behavior if
> that poses a performance problem, but similarly opens up room for
> off-by-one errors.

More than the performance problem, I'm more worried about the
maintainability of strncpy(3).  When 20 years from now, a programmer
reading a piece of code full of strncpy(3) wants to migrate to a sane
function like strlcpy(3) or strcpy(3), the programmer needs to
understand if the zeroing was purposeful or just accidental.  Because
by using strlcpy(3), it may start leaking some trailing data if the
trailing of the buffer is meaningful to some program.

> 
> For the sake of reference, I looked into a few big C and C++ projects to
> see how often a strncpy(3)-based snippet was used to produce a truncated
> copy. I found 18 instances in glibc 2.38, 2 in util-linux 2.39.2 (in spite
> of its custom xstrncpy() function), 61 in GNU binutils 2.41, 43 in
> GDB 13.2, 1 in LLVM 17.0.4, 7 in CPython 3.12.0, 99 in OpenJDK 22+22,
> 10 in .NET Runtime 7.0.13, 3 in V8 12.1.82, and 86 in Firefox 120.0. (Note
> that I haven't filtered out vendored dependencies, so there's a little bit
> of double-counting.) It seems like most codebases that don't ban strncpy(3)
> use a derived snippet somewhere or another. Also, I found 3 instances in
> glibc 2.38 and 5 instances in Firefox 120.0 of detecting truncation by
> checking the last character.

I know.  I've been rewriting the code handling strings in shadow-utils
for the last year, and ther was a lot of it.  I fixed several small bugs
in the process, so I recommend avoiding it.

> 
> So these two snippets really are widespread, especially among the long tail
> of smaller C and C++ applications and libraries that don't perform enough
> string manipulation that it warrants creating a custom set of more-
> foolproof wrapper functions (at least, in the opinion of their authors).



> Thus, since they're not going away, it would be useful for anyone reading
> the code to understand the concept behind how these two snippets work, that
> the only difference between the strncpy(3)'s special "character sequence"
> and an ordinary C string is an additional null terminator at the end of the
> destination buffer.

This is part of string_copying(7):

DESCRIPTION
   Terms (and abbreviations)
     string (str)
            is  a  sequence  of zero or more non‐null characters followed by a
            null byte.

     character sequence
            is a sequence of zero or  more  non‐null  characters.   A  program
            should  never use a character sequence where a string is required.
            However, with appropriate care, a string can be used in the  place
            of a character sequence.

I think that is very explicit in the difference.  strncpy(3) refers to
that page for understanding the differences, so I think it is
documented.

strncpy(3):
CAVEATS
     The  name  of  these  functions  is confusing.  These functions produce a
     null‐padded character sequence, not a string (see string_copying(7)).

> 
> In other words, strncpy(3) doesn't create a truncated string, but it
> creates something which can be easily turned into to a truncated string,
> and that's its most relevant quality for most of its uses in existing code.
> Further, apart from snprintf(3), there's no other portable way to produce a
> truncated string without manual arithmetic. Thus, I'd also find it

Portable is relative.  With libbsd, you can port to most POSIX systems.
Windows is another story.

> reasonable to highlight precisely why strncpy(3)'s output isn't a string

How about this?:

diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
index d4c2ce83d..c80c8b640 100644
--- a/man3/stpncpy.3
+++ b/man3/stpncpy.3
@@ -108,7 +108,10 @@ .SH HISTORY
 .SH CAVEATS
 The name of these functions is confusing.
 These functions produce a null-padded character sequence,
-not a string (see
+not a string.
+While strings have a terminating NUL byte,
+character sequences do not have any terminating byte
+(see
 .BR string_copying (7)).
 .P
 It's impossible to distinguish truncation by the result of the call,


> (viz., the lack of a null terminator), instead of trying to insist that its
> output is worlds apart from anything string-related, especially given the
> volume of existing correct code that belies that notion.

It is not correct code.  That code is doing extra work which confuses
maintainers.  It is a lot like writing dead code, since you're writing
zeros that nobody is reading, which confuses maintainers.

Also, I've seen a lot of off-by-one bugs in calls to strncpy(3), so no,
it's not correct code.  It's rather dangerous code that just happens to
not be vulnerable most of the time.

> 
> Or, to answer your question, "It's appropriate to keep using strncpy(3) in
> existing code where it's currently used as part of creating a truncated
> string, and it's not especially inappropriate to use strncpy(3) in new code
> as part of creating a truncated string, if the code must support platforms
> without strlcpy(3) or similar, and if the resulting snippets are few enough
> and well-commented enough that they create less mental load than creating
> and maintaining a custom helper function."

strncpy(3) calls are never well documented.  Do you add a comment in
each such call saying "this zeroing is superfluous"?  Probably not.

> 
> (As an aside, I find the remark in the man page that "It's impossible to
> distinguish truncation by the result of the call" extremely misleading at
> best, since truncation can easily be distinguished by inspecting the last
> output character.)

Again, strncpy(3)'s truncation is impossible to detect.  What you can
detect is that your construct that resembles strlcpy(3) truncates, which
is a different thing.

Thanks,
Alex

> 
> Thank you,
> Matthew House

-- 
<https://www.alejandro-colomar.es/>

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux