Re: strncpy clarify result may not be null terminated

Jonny Grant <jg@xxxxxxxx> · Mon, 13 Nov 2023 23:46:10 +0000

On 12/11/2023 20:49, Paul Eggert wrote:
> [dropping libc-alpha since this is only about the man pages]
> 
> On 2023-11-12 02:59, Alejandro Colomar wrote:
> 
>> I think the man-pages should go
>> ahead and write wrapper functions such as strtcpy() and stpecpy()
>> aound libc functions; these wrappers should provide a fast and safe
>> starting point for most programs.
> 
> It's OK for man pages to give these in EXAMPLES sections. However, the man pages currently go too far in this direction. Currently, if I type "man stpecpy", I get a man page with a synopsis and it looks to me like glibc supports stpecpy(3) just like it supports stpcpy(3). But glibc doesn't do that, as stpecpy is merely a man-pages invention: although the source code for stpecpy is in the EXAMPLES section of string_copying(7), you can't use stpecpy in an app without copy-and-pasting the man page's source into your code.
> 
> It's not just stepecpy. For example, there is no ustr2stp function in glibc, but "man ustr2stp" acts as if there is one.
> 
> The man pages should describe the library that exists, not the library that some of us would rather have.
> 
> 
>> It's true that memcpy(3) is the fastest function one can use, but it
>> requires the programmer to be rather careful with the lengths of the
>> strings.  I don't think keeping track of all those little details is
>> what the common programmer should do.
> 
> Unfortunately, C is not designed for string use that's that convenient. If you want safe and efficient use of possibly-long C strings, keeping track of lengths is generally the best way to do it.
> 
> 
>>> glibc/strlcpy.c __strlcpy() is there a reason when truncating it overwrites the last byte, twice?
>>>
>>> memcpy (dest, src, size);
>>> dest[size - 1] = '\0';
>>
>> -1's in the source code make up for off-by-one bugs.
> 
> The "dest[size - 1] = '\0';" is there because strlcpy(dst, src, sz) is defined to null-terminate the result if sz!=0, so that particular "-1" isn't a bug. (Perhaps you meant that the strlcpy spec itself is buggy? It wasn't clear to me.)
> 
> That "last byte, twice" question is: why is the last argument to memcpy "size" and not "size - 1" which would be equally correct? The answer is performance: memcpy often works faster when copying a number of bytes that is a multiple of a smallish power of two, and "size" is more likely than "size - 1" to be such a multiple.
> 

Thank you for your reply. I see what you mean, many programmers consider sizes and would make their dest buffer say 32 bytes, so when this truncation occurs it makes sense to make the most of that to copy quickly, even if that means writing the null terminator on top of the last written byte. Probably someone measured strlcpy with these truncation calls and saw a lot of convenient power of 2 sizes coming through, when truncating strings in this way.

Personally, I'm not sure if it is much use when strings are truncated, as strlcpy detects, an API like this could just return an error and not partially copy. Then the programmer would have a chance to realloc() and copy the full string. 

The strlcpy API returns src_length, even when it's truncated and didn't write src_length+1 bytes to dest, how misleading. Shame strlcpy can't be [[deprecated]].

I'm sure everyone may have read these posts before about strlcpy, just sharing while I remember:

Ulrich Drepper frowned upon strlcpy:
https://sourceware.org/legacy-ml/libc-alpha/2000-08/msg00053.html

"This is horribly inefficient BSD crap.  Using these function only
leads to other errors.  Correct string handling means that you always
know how long your strings are and therefore you can you memcpy
(instead of strcpy).

Beside, those who are using strcat or variants deserved to be punished."

The rest of the thread is also interesting.

Kind regards, Jonny