Re: strncpy clarify result may not be null terminated

Alejandro Colomar <alx@xxxxxxxxxx> · Sat, 11 Nov 2023 22:13:40 +0100

Hi Paul,

On Fri, Nov 10, 2023 at 02:14:13PM -0800, Paul Eggert wrote:
> On 2023-11-10 11:52, Alejandro Colomar wrote:
> 
> > Do you have any numbers?
> 
> It depends on size of course. With programs like 'tar' (one of the few
> programs that actually needs something like strncpy) the destination buffer
> is usually fairly small (32 bytes or less) though some of them are 100
> bytes. I used 16 bytes in the following shell transcript:
> 
> $ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy strlcpy; do echo;
> echo $i:; time ./a.out 16 100000000 abcdefghijk $i; done
> 
> strnlen+strcpy:
> 
> real	0m0.411s
> user	0m0.411s
> sys	0m0.000s
> 
> strnlen+memcpy:
> 
> real	0m0.392s
> user	0m0.388s
> sys	0m0.004s
> 
> strncpy:
> 
> real	0m0.300s
> user	0m0.300s
> sys	0m0.000s
> 
> stpncpy:
> 
> real	0m0.326s
> user	0m0.326s
> sys	0m0.000s
> 
> strlcpy:
> 
> real	0m0.623s
> user	0m0.623s
> sys	0m0.000s
> 
> 
> ... where a.out was generated by compiling the attached program with gcc -O2
> on Ubuntu 23.10 64-bit on a Xeon W-1350.
> 
> I wouldn't take these numbers all that seriously, as microbenchmarks like
> these are not that informative these days. Still, for a typical case one
> should not assume strncpy must be slower merely because it has more work to
> do; quite the contrary.

Thanks for the benchmarck!  Yeah, I won't take it as the last word, but
it shows the growth order (and its cause) of the different alternatives.

I'd like to point out some curious things about it:

-  strnlen+strcpy is slower than strnlen+memcpy.

   The compiler has all the information necessary here, so I don't see
   why it's not optimizing out the strcpy(3) into a simple memcpy(3).
   AFAICS, it's a missed optimization.  Even with -O3, it misses the
   optimization.

-  strncpy is slower than stpncpy in my computer.

   stpncpy is in fact the fastest call in my computer.

   Was strncpy(3) optimized in a recent version of glibc that you have?
   I'm using Debian Sid on an underclocked i9-13900T.  Or is it maybe
   just luck?  I'm curious.

	$ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do
		echo; echo $i:;
		time ./a.out 16 100000000 abcdefghijk $i;
	  done;

	strnlen+strcpy:

	real	0m0.188s
	user	0m0.184s
	sys	0m0.004s

	strnlen+memcpy:

	real	0m0.148s
	user	0m0.148s
	sys	0m0.000s

	strncpy:

	real	0m0.157s
	user	0m0.157s
	sys	0m0.000s

	stpncpy:

	real	0m0.135s
	user	0m0.135s
	sys	0m0.000s

	memccpy:

	real	0m0.208s
	user	0m0.208s
	sys	0m0.000s

	strlcpy:

	real	0m0.322s
	user	0m0.322s
	sys	0m0.000s

-  strlcpy(3) is very heavy.  Much more than I expected.  See some tests
   with larger strings.  The main growth of strlcpy(3) comes from slen.

	$ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do
		echo; echo $i:;
		time ./a.out 64 100000000 aaaabbbbaaaaccccaaaabbbbaaaadddd $i;
	  done;

	strnlen+strcpy:

	real	0m0.242s
	user	0m0.242s
	sys	0m0.000s

	strnlen+memcpy:

	real	0m0.190s
	user	0m0.186s
	sys	0m0.004s

	strncpy:

	real	0m0.174s
	user	0m0.173s
	sys	0m0.000s

	stpncpy:

	real	0m0.170s
	user	0m0.166s
	sys	0m0.004s

	memccpy:

	real	0m0.253s
	user	0m0.249s
	sys	0m0.004s

	strlcpy:

	real	0m1.385s
	user	0m1.385s
	sys	0m0.000s

-  strncpy(3) also gets heavy compared to strnlen+memcpy.
   Considering how small the difference with memcpy is for small
   strings, I wouldn't recommend it instead of memcpy, except for
   micro-optimizations.  The main growth of strncpy(3) comes from dsize.

	$ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do
		echo; echo $i:;
		time ./a.out 256 100000000 aaaabbbbaaaaccccaaaabbbbaaaadddd $i;
	  done;

	strnlen+strcpy:

	real	0m0.234s
	user	0m0.233s
	sys	0m0.001s

	strnlen+memcpy:

	real	0m0.192s
	user	0m0.192s
	sys	0m0.000s

	strncpy:

	real	0m0.268s
	user	0m0.268s
	sys	0m0.000s

	stpncpy:

	real	0m0.267s
	user	0m0.267s
	sys	0m0.000s

	memccpy:

	real	0m0.257s
	user	0m0.256s
	sys	0m0.001s

	strlcpy:

	real	0m1.574s
	user	0m1.574s
	sys	0m0.000s

	$ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do
		echo; echo $i:;
		time ./a.out 4096 100000000 aaaabbbbaaaaccccaaaabbbbaaaadddd $i;
	  done;

	strnlen+strcpy:

	real	0m0.227s
	user	0m0.227s
	sys	0m0.000s

	strnlen+memcpy:

	real	0m0.190s
	user	0m0.190s
	sys	0m0.000s

	strncpy:

	real	0m1.400s
	user	0m1.399s
	sys	0m0.000s

	stpncpy:

	real	0m1.398s
	user	0m1.398s
	sys	0m0.000s

	memccpy:

	real	0m0.256s
	user	0m0.256s
	sys	0m0.000s

	strlcpy:

	real	0m1.184s
	user	0m1.184s
	sys	0m0.000s

-  strnlen(3)+memcpy(3) becomes the fastest when dsize grows a bit over
   a few hundred bytes, and is only a few 10%'s slower than the fastest
   for smaller buffers.

   It is also the most semantically correct (together with
   strnlen+strcpy), avoiding unnecessary dead code (padding).  This
   should get the main backing from the manual pages.

   However, it can be useful to document typical alternatives to prevent
   mistakes from users.  Especially, since some micro-optimizations may
   favor uses of strncpy(3).

Cheers,
Alex   

> #include <stdlib.h>
> #include <string.h>
> 
> 
> int
> main (int argc, char **argv)
> {
>   if (argc != 5)
>     return 2;
>   long bufsize = atol (argv[1]);
>   char *buf = malloc (bufsize);
>   long n = atol (argv[2]);
>   char const *a = argv[3];
>   if (strcmp (argv[4], "strnlen+strcpy") == 0)
>     {
>       for (long i = 0; i < n; i++)
> 	{
> 	  if (strnlen (a, bufsize) == bufsize)
> 	    return 1;
> 	  strcpy (buf, a);
> 	}
>     }
>   else if (strcmp (argv[4], "strnlen+memcpy") == 0)
>     {
>       for (long i = 0; i < n; i++)
> 	{
> 	  size_t alen = strnlen (a, bufsize);
> 	  if (alen == bufsize)
> 	    return 1;
> 	  memcpy (buf, a, alen + 1);
> 	}
>     }
>   else if (strcmp (argv[4], "strncpy") == 0)
>     {
>       for (long i = 0; i < n; i++)
> 	if (strncpy (buf, a, bufsize)[bufsize - 1])
> 	  return 1;
>     }
>   else if (strcmp (argv[4], "stpncpy") == 0)
>     {
>       for (long i = 0; i < n; i++)
> 	if (stpncpy (buf, a, bufsize) == buf + bufsize)
> 	  return 1;
>     }

I've added the following one for completeness.  Especially now that
it'll be in C2x.

  else if (strcmp (argv[4], "memccpy") == 0)
    {
      for (long i = 0; i < n; i++)
	if (memccpy (buf, a, 0, bufsize) == NULL)
	  return 1;
    }

>   else if (strcmp (argv[4], "strlcpy") == 0)
>     {
>       for (long i = 0; i < n; i++)
> 	if (strlcpy (buf, a, bufsize) == bufsize)

This should have been >= bufsize, right?

> 	  return 1;
>     }
>   else
>     return 2;
> }

-- 
<https://www.alejandro-colomar.es/>
Attachment:
signature.asc

Description: PGP signature