Re: strncpy clarify result may not be null terminated

Jonny Grant <jg@xxxxxxxx> · Sun, 12 Nov 2023 09:52:20 +0000

On 11/11/2023 21:13, Alejandro Colomar wrote:
> Hi Paul,
> 
> On Fri, Nov 10, 2023 at 02:14:13PM -0800, Paul Eggert wrote:
>> On 2023-11-10 11:52, Alejandro Colomar wrote:
>>
>>> Do you have any numbers?
>>
>> It depends on size of course. With programs like 'tar' (one of the few
>> programs that actually needs something like strncpy) the destination buffer
>> is usually fairly small (32 bytes or less) though some of them are 100
>> bytes. I used 16 bytes in the following shell transcript:
>>
>> $ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy strlcpy; do echo;
>> echo $i:; time ./a.out 16 100000000 abcdefghijk $i; done
>>
>> strnlen+strcpy:
>>
>> real	0m0.411s
>> user	0m0.411s
>> sys	0m0.000s
>>
>> strnlen+memcpy:
>>
>> real	0m0.392s
>> user	0m0.388s
>> sys	0m0.004s
>>
>> strncpy:
>>
>> real	0m0.300s
>> user	0m0.300s
>> sys	0m0.000s
>>
>> stpncpy:
>>
>> real	0m0.326s
>> user	0m0.326s
>> sys	0m0.000s
>>
>> strlcpy:
>>
>> real	0m0.623s
>> user	0m0.623s
>> sys	0m0.000s
>>
>>
>> ... where a.out was generated by compiling the attached program with gcc -O2
>> on Ubuntu 23.10 64-bit on a Xeon W-1350.
>>
>> I wouldn't take these numbers all that seriously, as microbenchmarks like
>> these are not that informative these days. Still, for a typical case one
>> should not assume strncpy must be slower merely because it has more work to
>> do; quite the contrary.
> 
> Thanks for the benchmarck!  Yeah, I won't take it as the last word, but
> it shows the growth order (and its cause) of the different alternatives.
> 
> I'd like to point out some curious things about it:
> 
> -  strnlen+strcpy is slower than strnlen+memcpy.
> 
>    The compiler has all the information necessary here, so I don't see
>    why it's not optimizing out the strcpy(3) into a simple memcpy(3).
>    AFAICS, it's a missed optimization.  Even with -O3, it misses the
>    optimization.
> 
> -  strncpy is slower than stpncpy in my computer.
> 
>    stpncpy is in fact the fastest call in my computer.
> 
>    Was strncpy(3) optimized in a recent version of glibc that you have?
>    I'm using Debian Sid on an underclocked i9-13900T.  Or is it maybe
>    just luck?  I'm curious.
> 
> 	$ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do
> 		echo; echo $i:;
> 		time ./a.out 16 100000000 abcdefghijk $i;
> 	  done;
> 
> 	strnlen+strcpy:
> 
> 	real	0m0.188s
> 	user	0m0.184s
> 	sys	0m0.004s
> 
> 	strnlen+memcpy:
> 
> 	real	0m0.148s
> 	user	0m0.148s
> 	sys	0m0.000s
> 
> 	strncpy:
> 
> 	real	0m0.157s
> 	user	0m0.157s
> 	sys	0m0.000s
> 
> 	stpncpy:
> 
> 	real	0m0.135s
> 	user	0m0.135s
> 	sys	0m0.000s
> 
> 	memccpy:
> 
> 	real	0m0.208s
> 	user	0m0.208s
> 	sys	0m0.000s
> 
> 	strlcpy:
> 
> 	real	0m0.322s
> 	user	0m0.322s
> 	sys	0m0.000s
> 
> -  strlcpy(3) is very heavy.  Much more than I expected.  See some tests
>    with larger strings.  The main growth of strlcpy(3) comes from slen.
> 
> 	$ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do
> 		echo; echo $i:;
> 		time ./a.out 64 100000000 aaaabbbbaaaaccccaaaabbbbaaaadddd $i;
> 	  done;
> 
> 	strnlen+strcpy:
> 
> 	real	0m0.242s
> 	user	0m0.242s
> 	sys	0m0.000s
> 
> 	strnlen+memcpy:
> 
> 	real	0m0.190s
> 	user	0m0.186s
> 	sys	0m0.004s
> 
> 	strncpy:
> 
> 	real	0m0.174s
> 	user	0m0.173s
> 	sys	0m0.000s
> 
> 	stpncpy:
> 
> 	real	0m0.170s
> 	user	0m0.166s
> 	sys	0m0.004s
> 
> 	memccpy:
> 
> 	real	0m0.253s
> 	user	0m0.249s
> 	sys	0m0.004s
> 
> 	strlcpy:
> 
> 	real	0m1.385s
> 	user	0m1.385s
> 	sys	0m0.000s
> 
> -  strncpy(3) also gets heavy compared to strnlen+memcpy.
>    Considering how small the difference with memcpy is for small
>    strings, I wouldn't recommend it instead of memcpy, except for
>    micro-optimizations.  The main growth of strncpy(3) comes from dsize.
> 
> 	$ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do
> 		echo; echo $i:;
> 		time ./a.out 256 100000000 aaaabbbbaaaaccccaaaabbbbaaaadddd $i;
> 	  done;
> 
> 	strnlen+strcpy:
> 
> 	real	0m0.234s
> 	user	0m0.233s
> 	sys	0m0.001s
> 
> 	strnlen+memcpy:
> 
> 	real	0m0.192s
> 	user	0m0.192s
> 	sys	0m0.000s
> 
> 	strncpy:
> 
> 	real	0m0.268s
> 	user	0m0.268s
> 	sys	0m0.000s
> 
> 	stpncpy:
> 
> 	real	0m0.267s
> 	user	0m0.267s
> 	sys	0m0.000s
> 
> 	memccpy:
> 
> 	real	0m0.257s
> 	user	0m0.256s
> 	sys	0m0.001s
> 
> 	strlcpy:
> 
> 	real	0m1.574s
> 	user	0m1.574s
> 	sys	0m0.000s
> 
> 	$ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do
> 		echo; echo $i:;
> 		time ./a.out 4096 100000000 aaaabbbbaaaaccccaaaabbbbaaaadddd $i;
> 	  done;
> 
> 	strnlen+strcpy:
> 
> 	real	0m0.227s
> 	user	0m0.227s
> 	sys	0m0.000s
> 
> 	strnlen+memcpy:
> 
> 	real	0m0.190s
> 	user	0m0.190s
> 	sys	0m0.000s
> 
> 	strncpy:
> 
> 	real	0m1.400s
> 	user	0m1.399s
> 	sys	0m0.000s
> 
> 	stpncpy:
> 
> 	real	0m1.398s
> 	user	0m1.398s
> 	sys	0m0.000s
> 
> 	memccpy:
> 
> 	real	0m0.256s
> 	user	0m0.256s
> 	sys	0m0.000s
> 
> 	strlcpy:
> 
> 	real	0m1.184s
> 	user	0m1.184s
> 	sys	0m0.000s
> 
> 
> -  strnlen(3)+memcpy(3) becomes the fastest when dsize grows a bit over
>    a few hundred bytes, and is only a few 10%'s slower than the fastest
>    for smaller buffers.
> 
>    It is also the most semantically correct (together with
>    strnlen+strcpy), avoiding unnecessary dead code (padding).  This
>    should get the main backing from the manual pages.
> 
>    However, it can be useful to document typical alternatives to prevent
>    mistakes from users.  Especially, since some micro-optimizations may
>    favor uses of strncpy(3).
> 
> Cheers,
> Alex   
> 
>> #include <stdlib.h>
>> #include <string.h>
>>
>>
>> int
>> main (int argc, char **argv)
>> {
>>   if (argc != 5)
>>     return 2;
>>   long bufsize = atol (argv[1]);
>>   char *buf = malloc (bufsize);
>>   long n = atol (argv[2]);
>>   char const *a = argv[3];
>>   if (strcmp (argv[4], "strnlen+strcpy") == 0)
>>     {
>>       for (long i = 0; i < n; i++)
>> 	{
>> 	  if (strnlen (a, bufsize) == bufsize)
>> 	    return 1;
>> 	  strcpy (buf, a);
>> 	}
>>     }
>>   else if (strcmp (argv[4], "strnlen+memcpy") == 0)
>>     {
>>       for (long i = 0; i < n; i++)
>> 	{
>> 	  size_t alen = strnlen (a, bufsize);
>> 	  if (alen == bufsize)
>> 	    return 1;
>> 	  memcpy (buf, a, alen + 1);
>> 	}
>>     }
>>   else if (strcmp (argv[4], "strncpy") == 0)
>>     {
>>       for (long i = 0; i < n; i++)
>> 	if (strncpy (buf, a, bufsize)[bufsize - 1])
>> 	  return 1;
>>     }
>>   else if (strcmp (argv[4], "stpncpy") == 0)
>>     {
>>       for (long i = 0; i < n; i++)
>> 	if (stpncpy (buf, a, bufsize) == buf + bufsize)
>> 	  return 1;
>>     }
> 
> I've added the following one for completeness.  Especially now that
> it'll be in C2x.
> 
>   else if (strcmp (argv[4], "memccpy") == 0)
>     {
>       for (long i = 0; i < n; i++)
> 	if (memccpy (buf, a, 0, bufsize) == NULL)
> 	  return 1;
>     }
> 
>>   else if (strcmp (argv[4], "strlcpy") == 0)
>>     {
>>       for (long i = 0; i < n; i++)
>> 	if (strlcpy (buf, a, bufsize) == bufsize)
> 
> This should have been >= bufsize, right?
> 
>> 	  return 1;
>>     }
>>   else
>>     return 2;
>> }
> 
> 

Maybe we're gonna need a bigger benchmark.

Probably there existing studies. Or could patch something like SQLite Benchmark to utilise each string function just for measurements. Hopefully it moves around at least 2GB of strings to give some meaningful comparison timings.

As Paul mentioned, strlcpy is a poor choice for processing strings. Could rely on their guidance as they already measured.
https://www.gnu.org/software/libc/manual/html_node/Truncating-Strings.html

Maybe the strlcpy API is easier, safer for programmers; but the compiler can't figure out that the programmer already knew src string length. So the strlcpy does a strlen() and wastes time reading over memory. If the src length is known, can just memcpy.

When I've benchmarked things, reducing the memory accesses for read, write boosted performance, also looked at the cycles taken, of course cache and alignment all play a part too.

Maybe could suggest in your man page programmers should keep track of the src size ? - to save the cost of the strlen().

At least the strlen functions are optimized:
glibc/strnlen.c calls memchr() searching for '\0' memchr searches 4 bytes at a time.
glibc/strlen.c searches 4 bytes at a time.

glibc/strlcpy.c __strlcpy() is there a reason when truncating it overwrites the last byte, twice?

memcpy (dest, src, size);
dest[size - 1] = '\0';

Kind regards, Jonny