Hey Lennart! On 2023-07-28 21:22, Lennart Jablonka wrote: > A function can't check whether a pointer points to the start of a > string. What it certainly can do is to keep reading until you either > find a null byte or read the secret key that lies adjacent in memory and > post it to your favorite mailing list. > > strlcpy and strlcat behave the exact same way any other function > accepting a string behaves: If you don't pass a string, the behavior is > undefined. And that, I believe, does not deserve a special mention > here, seeing as all the other string functions don't get such a mention > either. Hmm, you're right. What I intended to mean is that while most other functions --e.g., strcpy(3)-- overwrite after the buffer, the design of strlcpy(3) is a bit more clever and makes it so that when the caller invokes UB, it tries to exploit that UB in a way that the input string is entirely read before starting to write, which makes it more likely to crash in a read, rather than writing to random memory (which might still happen if the read is not enough to crash, though). But it's true that it's not magic, and the UB is still there, so I agree in removing that. It's dangerous to make reader believe that it can avoid UB, so it's preferable to not mention that at all. > > Signed-off-by: Lennart Jablonka <humm@xxxxxxxxx> > --- > Hey Alex! > > I don't dislike string_copying(7) overall. This is one of the parts of > the content that I dislike---it is false, after all. Besides that: > > The "definitions" at the top don't make it clear enough that they aren't > supposed to be precise definitions used in your usual C jargon; that > while string and a string's length and an object's size are defined > by C, and while you understand and sometimes use most of these terms, > there is no norm that says "When you talk about a pointer to one past > a buffer's last byte, you call it 'end'!" That there is no norm that > says "When you say 'copy,' you write to the beginning, not to > elsewhere!" True. My intention was to settle the jargon and pseudo-standardize these terms (of course in decades, not tomorrow). Every other project uses a different term, and I'd like to unify. > > Furthermore, the terminology around "character sequences" confused me > while reading the page. When do you have a buffer, neither > null-terminated nor null-padded, that is defined not to contain null > bytes? NGINX uses these internally: $ grepc ngx_str_t ./src/core/ngx_string.h:16: typedef struct { size_t len; u_char *data; } ngx_str_t; Basically it's a non-zero buffer plus its length. They have interesting properties; for example, you can take a substring (or should I call it sub-sequence) just by taking a pointer to somewhere in the middle, and the length of the substring, without really copying the string. > And how do functions behave that want a character sequence if > that does contain a null byte? Do they take the null byte to signal the > character sequence's end? Need they accept the null byte as part of the > character sequence? Is the behavior undefined? NGINX handles these strings by the length stored in the buffer. Any null byte in the middle of a string would be treated as any other character, although they would be problematic when interfacing libc; in general, care is taken to not have null bytes in those strings. NGINX uses mempcpy(3) (or rather, ngx_cpymem(), which is the same thing) to copy these things, or other more sophisticated functions and macros based on mempcpy(3). $ grepc ngx_cpymem ./src/core/ngx_string.h:97: #define ngx_cpymem(dst, src, n) (((u_char *) ngx_memcpy(dst, src, n)) + (n)) ./src/core/ngx_string.h:107: #define ngx_cpymem(dst, src, n) (((u_char *) memcpy(dst, src, n)) + (n)) > > And lastly, the man page doesn't list the functions' standards or who > invented them. That was deliberate. The specific pages of each of those functions already documents that. Since the intention was to differentiate the use cases of each of the functions, I believe mentioning the standards would just deviate from that main point, and so I omitted that info. The point is that choosing one of these functions shouldn't depend on what standards are available to the programmer. Instead, the programmer should use the appropriate function, and then if it's not available, it should be written within the project (probably as a wrapper around other functions) to be able to use it. That's why I provided some naive implementations of some of them. Thanks for your opinion and review of the page! > > man7/string_copying.7 | 6 +----- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/man7/string_copying.7 b/man7/string_copying.7 > index 04426ef77..308cada36 100644 > --- a/man7/string_copying.7 > +++ b/man7/string_copying.7 > @@ -223,8 +223,7 @@ It only requires to check for truncation once after all chained calls. > .BR strlcpy (3bsd) > and > .BR strlcat (3bsd) > -are designed to crash if the input string is invalid > -(doesn't contain a terminating null byte). > +are similar, but less efficient when chained. Ok. > .IP \[bu] > .BR stpncpy (3) > and > @@ -410,9 +409,6 @@ isn't large enough to hold the copy, > the resulting string is truncated > (but it is guaranteed to be null-terminated). > They return the length of the total string they tried to create. > -These functions force a SIGSEGV if the > -.I src > -pointer is not a string. Ok. Patch applied. Thanks! Cheers, Alex > .IP > .BR stpecpy (3) > is a simpler alternative to these functions. -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature