On 12/12/22 00:59, Alejandro Colomar wrote:
Hi all!I'm planning to add a new manual page that documents all string copying functions. It covers more detail than any of the existing manual pages (and in fact, I've discovered some properties of the functions documented while working on this page). The intention is to remove the existing separate manual pages for all string copying functions, and make them links to this new page. It intends to be the only reference documentation for copying strings in C, and hopefully fix the half century of suboptimal string copying library with which we've lived. (Say goodbye to std::string, here come back C strings ;)The formatted manual page is below. AlexP.S.: I'm sorry for your beloved string copying function(s); it has high chances of being dreaded by the page below. Not sorry. Oh well, at least I justified it, or I tried :-)--- string_copy(7) Miscellaneous Information Manual string_copy(7) NAME stpcpy, stpecpy, stpecpyx, strlcpy, strlcat, strscpy, strcpy, strcat, stpncpy, ustr2stp, strncpy, strncat, mempcpy - copy strings SYNOPSIS (Null‐terminated) strings // Chain‐copy a string. char *stpcpy(char *restrict dst, const char *restrict src); // Chain‐copy a string with truncation (not in libc). char *stpecpy(char *dst, char past_end[0], const char *restrict src); // Chain‐copy a string with truncation and SIGSEGV on invalid input. char *stpecpyx(char *dst, char past_end[0], const char *restrict src); // Copy a string with truncation and SIGSEGV on invalid input. [[deprecated]] // Use stpecpyx() instead. size_t strlcpy(char dst[restrict .sz], const char *restrict src, size_t sz); // Concatenate a string with truncation. [[deprecated]] // Use stpecpyx() instead. size_t strlcat(char dst[restrict .sz], const char *restrict src, size_t sz); // Copy a string with truncation (not in libc). [[deprecated]] // Use stpecpy() instead. ssize_t strscpy(char dst[restrict .sz], const char src[restrict .sz], size_t sz); // Copy a string. [[deprecated]] // Use stpcpy(3) instead. char *strcpy(char *restrict dst, const char *restrict src); // Concatenate a string. [[deprecated]] // Use stpcpy(3) instead. char *strcat(char *restrict dst, const char *restrict src); Unterminated strings (null‐padded fixed‐width buffers) // Zero a fixed‐width buffer, and // copy a string with truncation into an unterminated string. char *stpncpy(char dst[restrict .sz], const char *restrict src, size_t sz); // Chain‐copy an unterminated string into a string (not in libc). char *ustr2stp(char *restrict dst, const char src[restrict .sz], size_t sz); // Zero a fixed‐width buffer, and // copy a string with truncation into an unterminated string [[deprecated]] // Use stpncpy(3) instead. char *strncpy(char dest[restrict .sz], const char *restrict src, size_t sz); // Concatenate an unterminated string into a string. [[deprecated]] // Use ustr2stp() instead. char *strncat(char *restrict dst, const char src[restrict .sz], size_t sz); String structures // (Null‐terminated) string structure. struct str_s { size_t len; char *str; }; // Unterminated string structure (overlapping strings). struct ustr_s { size_t len; char *ustr; }; // Chain‐copy a string structure into an unterminated string. void *mempcpy(void *restrict dst, const void src[restrict len], size_t len); DESCRIPTION Terms (and abbreviations) string (str) is a sequence of zero or more non‐null characters, followed by a null byte. unterminated string (ustr) is a sequence of zero or more non‐null characters. They are sometimes contained in fixed‐width buffers, which usually con‐ tain padding null bytes after the unterminated string, to fill the rest of the buffer without affecting the unterminated string; however, those padding null bytes are not part of the unterminated string. length (len) is the number of non‐null characters in a string. It is the re‐ turn value of strlen(str) and of strnlen(ustr, sz). size (sz) refers to the entire buffer where the string is contained. end is the name of a pointer to the terminating null byte of a string, or a pointer to one past the last character of an unter‐ minated string. This is the return value of functions that al‐ low chaining. It is equivalent to &str[len]. past_end is the name of a pointer to one past the end of the buffer that contains a string. It is equivalent to &str[sz]. It is used as a sentinel value, to be able to truncate strings instead of overrunning a buffer. string structure unterminated string structure Structure that contains the length of a string, as well as the string or the unterminated string. Types of functions Copy, concatenate, and chain‐copy Originally, there was a distinction between functions that copy and those that concatenate. However, newer functions that copy while allowing chaining cover both use cases with a single API. They are also algorithmically faster, since they don’t need to search for the end of the existing string. To chain copy functions, they need to return a pointer to the end. That’s a byproduct of the copy operation, so it has no performance costs. These functions are preferred over copy or concatenation functions. Functions that return such a pointer, and thus can be chained, have names of the form *stp*(), since it’s also common to name the pointer just p. Truncate or not? The first thing to note is that programmers should be careful with buffers, so they always have the correct size, and trunca‐ tion is not necessary. In most cases, truncation is not desired, and it is simpler to just do the copy. Simpler code is safer code. Programming against programming mistakes by adding more code just adds more points where mistakes can be made. Nowadays, compilers can detect most programmer errors with fea‐ tures like compiler warnings, static analyzers, and _FORTIFY_SOURCE (see ftm(7)). Keeping the code simple helps these error‐detection features be more precise. When validating user input, however, it makes sense to truncate. Remember to check the return value of such function calls. Functions that truncate: • stpecpy() is the most efficient string copy function that performs truncation. It only requires to check for trunca‐ tion once after all chained calls. • stpecpyx() is a variant of stpecpy() that consumes the entire source string, to catch bugs in the program by forcing a seg‐ mentation fault (as strlcpy(3bsd) and strlcat(3bsd) do). • strlcpy(3bsd) and strlcat(3bsd), which originated in OpenBSD, are designed to crash if the input string is invalid (doesn’t contain a null byte). • strscpy(9) is a function in the Linux kernel which reports an error instead of crashing. • stpncpy(3) and strncpy(3) also truncate, but they don’t write strings, but rather unterminated strings. Unterminated strings (null‐padded fixed‐width buffers) For historic reasons, some standard APIs, such as utmpx(5), use unter‐ minated strings in fixed‐width buffers. To interface with them, spe‐ cialized functions need to be used. To copy strings into them, use stpncpy(3). To copy from an unterminated string within a fixed‐width buffer into a string, ignoring any trailing null bytes in the source fixed‐width buffer, you should use ustr2stp(). String structures The simplest string copying function is mempcpy(3). It requires always knowing the length of your strings, for which string structures can be used. It makes the code simpler, since you always know the length of your strings, and it’s also faster, since it doesn’t need to repeatedly calculate those lengths. mempcpy(3) always creates an unterminated string, so you need to explicitly set the terminating null byte. String structure The following code can be used to chain‐copy from a string structure into a string: p = mempcpy(p, src->str, src->len); *p = '\0'; The following code can be used to chain‐copy from a string structure into an unterminated string: p = mempcpy(p, src->str, src->len); Unterminated string structure (overlapping strings) In programs that make considerable use of strings, and need the best performance, using overlapping strings can make a big dif‐ ference. It allows holding substrings of a bigger string while not duplicating memory nor using time to do a copy. However, this is delicate, since it requires using unterminated strings. C library APIs use strings, so programs that use un‐ terminated strings will have to take care to differentiate strings from unterminated strings. The following code can be used to chain‐copy from an untermi‐ nated string structure to a string: p = mempcpy(p, src->ustr, src->len); *p = '\0'; The following code can be used to chain‐copy from an untermi‐ nated string structure to an unterminated string: p = mempcpy(p, src->ustr, src->len); Functions stpcpy(3) This function copies the input string into a destination string. The programmer is responsible for allocating a buffer large enough. It returns a pointer suitable for chaining. stpecpy() stpecpyx() These functions copy the input string into a destination string. If the destination buffer, limited by a pointer to one past the end of it, isn’t large enough to hold the copy, the resulting string is truncated (but it is guaranteed to be null‐termi‐ nated). They return a pointer suitable for chaining. Trunca‐ tion needs to be detected only once after the last chained call. stpecpyx() has identical semantics to stpecpy(), except that it forces a SIGSEGV on Undefined Behavior. These functions are not provided by any library, but you can de‐ fine them with the following reference implementations: /* This code is in the public domain. */ char * stpecpy(char *dst, char past_end[0], const char *restrict src) { char *p; if (dst == past_end) return past_end; p = memccpy(dst, src, '\0', past_end - dst); if (p != NULL) return p - 1; /* truncation detected */ past_end[-1] = '\0'; return past_end; } /* This code is in the public domain. */ char * stpecpyx(char *dst, char past_end[0], const char *restrict src) { if (src[strlen(src)] != '\0') raise(SIGSEGV); return stpecpy(dst, past_end, src); } stpncpy(3) This function copies the input string into a destination null‐ padded fixed‐width unterminated string. If the destination buffer, limited by its size, isn’t large enough to hold the copy, the resulting string is truncated. Since it creates an unterminated string, it doesn’t need to write a terminating null byte. It returns a pointer suitable for chaining, but it’s not ideal for that. Truncation needs to be detected only once after the last chained call. If you’re going to use this function in chained calls, it would probably be useful to develop a function similar to stpecpy(). ustr2stp() This function copies the input unterminated string contained in a null‐padded wixed‐width buffer, into a destination (null‐ter‐ minated) string. The programmer is responsible for allocating a buffer large enough. It returns a pointer suitable for chain‐ ing. This function is not provided by any library, but you can write it with the definition above in this page. A truncating version of this function doesn’t exist, since the size of the original string is always known, so it wouldn’t be very useful. This function is not provided by any library, but you can define it with the following reference implementation: /* This code is in the public domain. */ char * ustr2stp(char *restrict dst, const char *restrict src, size_t sz) { char *end; end = memccpy(dst, src, '\0', sz)) ?: dst + sz; *end = '\0'; return end; } mempcpy(3) This function copies the input string, limited by its length, into a destination unterminated string. The programmer is re‐ sponsible for allocating a buffer large enough. It returns a pointer suitable for chaining. Deprecated functions strlcpy(3bsd) strlcat(3bsd) Deprecated. These functions copy the input string into a desti‐ nation string. If the destination buffer, limited by its size, isn’t large enough to hold the copy, the resulting string is truncated (but it is guaranteed to be null‐terminated). They return the length of the total string they tried to create. These functions force a SIGSEGV on Undefined Behavior. stpecpyx() is a better replacement for these functions for the following reasons: • Better performance (chain copy instead of concatenating). • Only requires detecting truncation once per chain of calls. strscpy(9) Deprecated. This function copies the input string into a desti‐ nation string. If the destination buffer, limited by its size, isn’t large enough to hold the copy, the resulting string is truncated (but it is guaranteed to be null‐terminated). It re‐ turns the length of the destination string, or -E2BIG on trunca‐ tion. stpecpy() is a better replacement for this function, since it has a much simpler interface. strcpy(3) strcat(3) Deprecated. These functions copy the input string into a desti‐ nation string. The programmer is responsible for allocating a buffer large enough. The return value is useless. strcpy(3) is identical to stpcpy(3) except for the useless re‐ turn value. stpcpy(3) is a better replacement for these functions for the following reasons: • Better performance (chain copy instead of concatenating). • No need to call strlen(3), thanks to the useful return value. strncpy(3) Deprecated. strncpy(3) is identical to stpncpy(3) except for the useless return value. Due to the return value, with this function it’s hard to correctly check for truncation. Use stp‐ ncpy(3) instead. strncat(3) Deprecated. Do not confuse this function with strncpy(3); they are not related at all. This function concatenates the input unterminated string con‐ tained in a null‐padded wixed‐width buffer, into a destination (null‐terminated) string. The programmer is responsible for al‐ locating a buffer large enough. The return value is useless. ustr2stp() is a better replacement for this function for the following reasons: • Better performance (chain copy instead of concatenating). • No need to call strlen(3), thanks to the useful return value. • Function name that is not actively confusing. RETURN VALUE The following functions return a pointer to the terminating null byte in the destination string (they never truncate). • stpcpy(3) • ustr2stp() • mempcpy(3) The following functions return a pointer to the terminating null byte in the destination string, except when truncation occurs; if truncation occurs, they return a pointer to one past the end of the destination buffer. • stpecpy() • stpecpyx() The following function returns a pointer to one after the last charac‐ ter in the destination unterminated string; if truncation occurs, that pointer is equivalent to a pointer to one past the end of the destina‐ tion buffer. • stpncpy(3) Deprecated The following functions return the length of the total string that they tried to create (as if truncation didn’t occur). • strlcpy(3bsd) • strlcat(3bsd) The following function returns the length of the destination string, or -E2BIG on truncation. • strscpy(9) The following functions return the dst pointer, which is useless. • strcpy(3) • strcat(3) • strncpy(3) • strncat(3) CAVEATS Some of the functions described here are not provided by any library; you should write your own copy if you want to use them. The deprecated status of these functions varies from system to system. This page declares as deprecated those functions that have a better re‐ placement documented in this same page. EXAMPLES The following are examples of correct use of each of these functions. stpcpy(3) p = buf; p = stpcpy(p, "Hello "); p = stpcpy(p, "world"); p = stpcpy(p, "!"); len = p - buf; puts(buf); stpecpy() stpecpyx() past_end = buf + sizeof(buf); p = buf; p = stpecpy(p, past_end, "Hello "); p = stpecpy(p, past_end, "world"); p = stpecpy(p, past_end, "!"); if (p == past_end) { p--; goto toolong; } len = p - buf; puts(buf); stpncpy(3) past_end = buf + sizeof(buf); end = stpncpy(buf, "Hello world!", sizeof(buf)); if (end == past_end) goto toolong; len = end - buf; for (size_t i = 0; i < sizeof(buf); i++) putchar(buf[i]); ustr2stp() p = buf; p = ustr2stp(p, "Hello ", 6); p = ustr2stp(p, "world", 42); // Padding null bytes ignored. p = ustr2stp(p, "!", 1); len = p - buf; puts(buf); mempcpy(3) p = buf; p = mempcpy(p, "Hello ", 6); p = mempcpy(p, "world", 5); p = mempcpy(p, "!", 1); p = '\0'; len = p - buf; puts(buf); Deprecated strlcpy(3bsd) strlcat(3bsd) if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf)) goto toolong; if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf)) goto toolong; len = strlcat(buf, "!", sizeof(buf)); if (len >= sizeof(buf)) goto toolong; puts(buf); strscpy(9) len = strscpy(buf, "Hello world!", sizeof(buf)); if (len == -E2BIG) goto toolong; puts(buf); strcpy(3) strcat(3) strcpy(buf, "Hello "); strcat(buf, "world"); strcat(buf, "!"); len = strlen(buf); puts(buf); strncpy(3) strncpy(buf, "Hello world!", sizeof(buf)); if (buf + sizeof(buf) - 1 == '\0') goto toolong; len = strnlen(buf, sizeof(buf)); for (size_t i = 0; i < sizeof(buf); i++) putchar(buf[i]); strncat(3) strncpy(buf, "Hello ", 6); strncat(buf, "world", 42); // Padding null bytes ignored. strncat(buf, "!", 1); puts(buf);
Oops, that example was mistaken; too much cut and paste. strncat(3) buf[0] = '\0'; strncat(buf, "Hello ", 6); strncat(buf, "world", 42); // Padding null bytes ignored. strncat(buf, "!", 1); len = strlen(buf); puts(buf);
SEE ALSO memcpy(3), memccpy(3), mempcpy(3), string(3) Linux man‐pages (unreleased) (date) string_copy(7)
-- <http://www.alejandro-colomar.es/>
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature