Re: string_copy(7): New manual page documenting string copying functions.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 12/12/22 00:59, Alejandro Colomar wrote:
Hi all!

I'm planning to add a new manual page that documents all string copying functions.  It covers more detail than any of the existing manual pages (and in fact, I've discovered some properties of the functions documented while working on this page).  The intention is to remove the existing separate manual pages for all string copying functions, and make them links to this new page.  It intends to be the only reference documentation for copying strings in C, and hopefully fix the half century of suboptimal string copying library with which we've lived.  (Say goodbye to std::string, here come back C strings ;)

The formatted manual page is below.

Alex

P.S.: I'm sorry for your beloved string copying function(s); it has high chances of being dreaded by the page below.  Not sorry.  Oh well, at least I justified it, or I tried :-)

---

string_copy(7)         Miscellaneous Information Manual         string_copy(7)

NAME
        stpcpy,  stpecpy,  stpecpyx, strlcpy, strlcat, strscpy, strcpy, strcat,
        stpncpy, ustr2stp, strncpy, strncat, mempcpy - copy strings

SYNOPSIS
    (Null‐terminated) strings
        // Chain‐copy a string.
        char *stpcpy(char *restrict dst, const char *restrict src);

        // Chain‐copy a string with truncation (not in libc).
        char *stpecpy(char *dst, char past_end[0], const char *restrict src);

        // Chain‐copy a string with truncation and SIGSEGV on invalid input.
        char *stpecpyx(char *dst, char past_end[0], const char *restrict src);

        // Copy a string with truncation and SIGSEGV on invalid input.
        [[deprecated]]  // Use stpecpyx() instead.
        size_t strlcpy(char dst[restrict .sz], const char *restrict src,
                       size_t sz);

        // Concatenate a string with truncation.
        [[deprecated]]  // Use stpecpyx() instead.
        size_t strlcat(char dst[restrict .sz], const char *restrict src,
                       size_t sz);

        // Copy a string with truncation (not in libc).
        [[deprecated]]  // Use stpecpy() instead.
        ssize_t strscpy(char dst[restrict .sz], const char src[restrict .sz],
                       size_t sz);

        // Copy a string.
        [[deprecated]]  // Use stpcpy(3) instead.
        char *strcpy(char *restrict dst, const char *restrict src);

        // Concatenate a string.
        [[deprecated]]  // Use stpcpy(3) instead.
        char *strcat(char *restrict dst, const char *restrict src);

    Unterminated strings (null‐padded fixed‐width buffers)
        // Zero a fixed‐width buffer, and
        // copy a string with truncation into an unterminated string.
        char *stpncpy(char dst[restrict .sz], const char *restrict src,
                       size_t sz);

        // Chain‐copy an unterminated string into a string (not in libc).
        char *ustr2stp(char *restrict dst, const char src[restrict .sz],
                       size_t sz);

        // Zero a fixed‐width buffer, and
        // copy a string with truncation into an unterminated string
        [[deprecated]]  // Use stpncpy(3) instead.
        char *strncpy(char dest[restrict .sz], const char *restrict src,
                       size_t sz);

        // Concatenate an unterminated string into a string.
        [[deprecated]]  // Use ustr2stp() instead.
        char *strncat(char *restrict dst, const char src[restrict .sz],
                       size_t sz);

    String structures
        // (Null‐terminated) string structure.
        struct str_s {
            size_t  len;
            char    *str;
        };

        // Unterminated string structure (overlapping strings).
        struct ustr_s {
            size_t  len;
            char    *ustr;
        };

        // Chain‐copy a string structure into an unterminated string.
        void *mempcpy(void *restrict dst, const void src[restrict len],
                       size_t len);

DESCRIPTION
    Terms (and abbreviations)
        string (str)
               is a sequence of zero or more non‐null characters, followed by a
               null byte.

        unterminated string (ustr)
               is a sequence of zero or more  non‐null  characters.   They  are
               sometimes  contained  in fixed‐width buffers, which usually con‐
               tain padding null bytes after the unterminated string,  to  fill
               the  rest  of  the  buffer  without  affecting  the unterminated
               string; however, those padding null bytes are not  part  of  the
               unterminated string.

        length (len)
               is the number of non‐null characters in a string.  It is the re‐
               turn value of strlen(str) and of strnlen(ustr, sz).

        size (sz)
               refers to the entire buffer where the string is contained.

        end    is  the  name  of  a  pointer  to the terminating null byte of a
               string, or a pointer to one past the last character of an unter‐
               minated string.  This is the return value of functions that  al‐
               low chaining.  It is equivalent to &str[len].

        past_end
               is  the name of a pointer to one past the end of the buffer that
               contains a string.  It is equivalent to &str[sz].  It is used as
               a sentinel value, to be able  to  truncate  strings  instead  of
               overrunning a buffer.

        string structure
        unterminated string structure
               Structure  that  contains the length of a string, as well as the
               string or the unterminated string.

    Types of functions
        Copy, concatenate, and chain‐copy
               Originally, there was a distinction between functions that  copy
               and  those that concatenate.  However, newer functions that copy
               while allowing chaining cover both use cases with a single  API.
               They  are  also algorithmically faster, since they don’t need to
               search for the end of the existing string.

               To chain copy functions, they need to return a  pointer  to  the
               end.   That’s  a  byproduct  of the copy operation, so it has no
               performance costs.  These functions are preferred over  copy  or
               concatenation  functions.  Functions that return such a pointer,
               and thus can be chained, have names of the form  *stp*(),  since
               it’s also common to name the pointer just p.

        Truncate or not?
               The  first  thing  to note is that programmers should be careful
               with buffers, so they always have the correct size, and  trunca‐
               tion is not necessary.

               In  most  cases, truncation is not desired, and it is simpler to
               just do the copy.  Simpler  code  is  safer  code.   Programming
               against  programming mistakes by adding more code just adds more
               points where mistakes can be made.

               Nowadays, compilers can detect most programmer errors with  fea‐
               tures    like   compiler   warnings,   static   analyzers,   and
               _FORTIFY_SOURCE (see ftm(7)).  Keeping  the  code  simple  helps
               these error‐detection features be more precise.

               When validating user input, however, it makes sense to truncate.
               Remember to check the return value of such function calls.

               Functions that truncate:

               •  stpecpy()  is  the  most  efficient string copy function that
                  performs truncation.  It only requires to check  for  trunca‐
                  tion once after all chained calls.

               •  stpecpyx() is a variant of stpecpy() that consumes the entire
                  source string, to catch bugs in the program by forcing a seg‐
                  mentation fault (as strlcpy(3bsd) and strlcat(3bsd) do).

               •  strlcpy(3bsd) and strlcat(3bsd), which originated in OpenBSD,
                  are designed to crash if the input string is invalid (doesn’t
                  contain a null byte).

               •  strscpy(9) is a function in the Linux kernel which reports an
                  error instead of crashing.

               •  stpncpy(3) and strncpy(3) also truncate, but they don’t write
                  strings, but rather unterminated strings.

    Unterminated strings (null‐padded fixed‐width buffers)
        For  historic reasons, some standard APIs, such as utmpx(5), use unter‐
        minated strings in fixed‐width buffers.  To interface with  them,  spe‐
        cialized functions need to be used.

        To copy strings into them, use stpncpy(3).

        To  copy from an unterminated string within a fixed‐width buffer into a
        string, ignoring any trailing null  bytes  in  the  source  fixed‐width
        buffer, you should use ustr2stp().

    String structures
        The simplest string copying function is mempcpy(3).  It requires always
        knowing  the length of your strings, for which string structures can be
        used.  It makes the code simpler, since you always know the  length  of
        your strings, and it’s also faster, since it doesn’t need to repeatedly
        calculate  those  lengths.   mempcpy(3)  always creates an unterminated
        string, so you need to explicitly set the terminating null byte.

        String structure
               The following code can be  used  to  chain‐copy  from  a  string
               structure into a string:

                   p = mempcpy(p, src->str, src->len);
                   *p = '\0';

               The  following  code  can  be  used  to chain‐copy from a string
               structure into an unterminated string:

                   p = mempcpy(p, src->str, src->len);

        Unterminated string structure (overlapping strings)
               In programs that make considerable use of strings, and need  the
               best  performance, using overlapping strings can make a big dif‐
               ference.  It allows holding substrings of a bigger string  while
               not duplicating memory nor using time to do a copy.

               However,  this is delicate, since it requires using unterminated
               strings.  C library APIs use strings, so programs that  use  un‐
               terminated  strings  will  have  to  take  care to differentiate
               strings from unterminated strings.

               The following code can be used to chain‐copy  from  an  untermi‐
               nated string structure to a string:

                   p = mempcpy(p, src->ustr, src->len);
                   *p = '\0';

               The  following  code  can be used to chain‐copy from an untermi‐
               nated string structure to an unterminated string:

                   p = mempcpy(p, src->ustr, src->len);

    Functions
        stpcpy(3)
               This function copies the input string into a destination string.
               The programmer is responsible  for  allocating  a  buffer  large
               enough.  It returns a pointer suitable for chaining.

        stpecpy()
        stpecpyx()
               These functions copy the input string into a destination string.
               If  the destination buffer, limited by a pointer to one past the
               end of it, isn’t large enough to hold the  copy,  the  resulting
               string  is  truncated  (but  it  is guaranteed to be null‐termi‐
               nated).  They return a pointer suitable for  chaining.   Trunca‐
               tion needs to be detected only once after the last chained call.
               stpecpyx()  has identical semantics to stpecpy(), except that it
               forces a SIGSEGV on Undefined Behavior.

               These functions are not provided by any library, but you can de‐
               fine them with the following reference implementations:

                   /* This code is in the public domain. */
                   char *
                   stpecpy(char *dst, char past_end[0],
                           const char *restrict src)
                   {
                       char *p;

                       if (dst == past_end)
                           return past_end;

                       p = memccpy(dst, src, '\0', past_end - dst);
                       if (p != NULL)
                           return p - 1;

                       /* truncation detected */
                       past_end[-1] = '\0';
                       return past_end;
                   }

                   /* This code is in the public domain. */
                   char *
                   stpecpyx(char *dst, char past_end[0],
                            const char *restrict src)
                   {
                       if (src[strlen(src)] != '\0')
                           raise(SIGSEGV);

                       return stpecpy(dst, past_end, src);
                   }

        stpncpy(3)
               This function copies the input string into a  destination  null‐
               padded  fixed‐width  unterminated  string.   If  the destination
               buffer, limited by its size, isn’t  large  enough  to  hold  the
               copy,  the  resulting  string is truncated.  Since it creates an
               unterminated string, it doesn’t need to write a terminating null
               byte.  It returns a pointer suitable for chaining, but it’s  not
               ideal for that.  Truncation needs to be detected only once after
               the last chained call.

               If  you’re going to use this function in chained calls, it would
               probably be useful to develop a function similar to stpecpy().

        ustr2stp()
               This function copies the input unterminated string contained  in
               a  null‐padded wixed‐width buffer, into a destination (null‐ter‐
               minated) string.  The programmer is responsible for allocating a
               buffer large enough.  It returns a pointer suitable  for  chain‐
               ing.

               This  function is not provided by any library, but you can write
               it with the definition above in this page.

               A truncating version of this function doesn’t exist,  since  the
               size  of  the original string is always known, so it wouldn’t be
               very useful.

               This function is not provided by any library, but you can define
               it with the following reference implementation:

                   /* This code is in the public domain. */
                   char *
                   ustr2stp(char *restrict dst, const char *restrict src,
                            size_t sz)
                   {
                       char  *end;

                       end = memccpy(dst, src, '\0', sz)) ?: dst + sz;
                       *end = '\0';

                       return end;
                   }

        mempcpy(3)
               This function copies the input string, limited  by  its  length,
               into  a  destination unterminated string.  The programmer is re‐
               sponsible for allocating a buffer large enough.   It  returns  a
               pointer suitable for chaining.

    Deprecated functions
        strlcpy(3bsd)
        strlcat(3bsd)
               Deprecated.  These functions copy the input string into a desti‐
               nation  string.  If the destination buffer, limited by its size,
               isn’t large enough to hold the copy,  the  resulting  string  is
               truncated  (but  it  is guaranteed to be null‐terminated).  They
               return the length of the total  string  they  tried  to  create.
               These functions force a SIGSEGV on Undefined Behavior.

               stpecpyx()  is  a better replacement for these functions for the
               following reasons:

               •  Better performance (chain copy instead of concatenating).

               •  Only requires detecting truncation once per chain of calls.

        strscpy(9)
               Deprecated.  This function copies the input string into a desti‐
               nation string.  If the destination buffer, limited by its  size,
               isn’t  large  enough  to  hold the copy, the resulting string is
               truncated (but it is guaranteed to be null‐terminated).  It  re‐
               turns the length of the destination string, or -E2BIG on trunca‐
               tion.

               stpecpy()  is  a  better replacement for this function, since it
               has a much simpler interface.

        strcpy(3)
        strcat(3)
               Deprecated.  These functions copy the input string into a desti‐
               nation string.  The programmer is responsible for  allocating  a
               buffer large enough.  The return value is useless.

               strcpy(3)  is  identical to stpcpy(3) except for the useless re‐
               turn value.

               stpcpy(3) is a better replacement for these  functions  for  the
               following reasons:

               •  Better performance (chain copy instead of concatenating).

               •  No need to call strlen(3), thanks to the useful return value.

        strncpy(3)
               Deprecated.   strncpy(3)  is  identical to stpncpy(3) except for
               the useless return value.  Due to the return  value,  with  this
               function  it’s hard to correctly check for truncation.  Use stp‐
               ncpy(3) instead.

        strncat(3)
               Deprecated.  Do not confuse this function with strncpy(3);  they
               are not related at all.

               This  function  concatenates  the input unterminated string con‐
               tained in a null‐padded wixed‐width buffer, into  a  destination
               (null‐terminated) string.  The programmer is responsible for al‐
               locating a buffer large enough.  The return value is useless.

               ustr2stp()  is  a  better  replacement for this function for the
               following reasons:

               •  Better performance (chain copy instead of concatenating).

               •  No need to call strlen(3), thanks to the useful return value.

               •  Function name that is not actively confusing.

RETURN VALUE
        The following functions return a pointer to the terminating  null  byte
        in the destination string (they never truncate).

        •  stpcpy(3)

        •  ustr2stp()

        •  mempcpy(3)

        The  following  functions return a pointer to the terminating null byte
        in the destination string, except when truncation occurs; if truncation
        occurs, they return a pointer to one past the end  of  the  destination
        buffer.

        •  stpecpy()

        •  stpecpyx()

        The  following function returns a pointer to one after the last charac‐
        ter in the destination unterminated string; if truncation occurs,  that
        pointer  is equivalent to a pointer to one past the end of the destina‐
        tion buffer.

        •  stpncpy(3)

    Deprecated
        The following functions return the length of the total string that they
        tried to create (as if truncation didn’t occur).

        •  strlcpy(3bsd)

        •  strlcat(3bsd)

        The following function returns the length of the destination string, or
        -E2BIG on truncation.

        •  strscpy(9)

        The following functions return the dst pointer, which is useless.

        •  strcpy(3)

        •  strcat(3)

        •  strncpy(3)

        •  strncat(3)

CAVEATS
        Some of the functions described here are not provided by  any  library;
        you should write your own copy if you want to use them.

        The  deprecated status of these functions varies from system to system.
        This page declares as deprecated those functions that have a better re‐
        placement documented in this same page.

EXAMPLES
        The following are examples of correct use of each of these functions.

        stpcpy(3)
                   p = buf;
                   p = stpcpy(p, "Hello ");
                   p = stpcpy(p, "world");
                   p = stpcpy(p, "!");
                   len = p - buf;
                   puts(buf);

        stpecpy()
        stpecpyx()
                   past_end = buf + sizeof(buf);
                   p = buf;
                   p = stpecpy(p, past_end, "Hello ");
                   p = stpecpy(p, past_end, "world");
                   p = stpecpy(p, past_end, "!");
                   if (p == past_end) {
                       p--;
                       goto toolong;
                   }
                   len = p - buf;
                   puts(buf);

        stpncpy(3)
                   past_end = buf + sizeof(buf);
                   end = stpncpy(buf, "Hello world!", sizeof(buf));
                   if (end == past_end)
                       goto toolong;
                   len = end - buf;
                   for (size_t i = 0; i < sizeof(buf); i++)
                       putchar(buf[i]);

        ustr2stp()
                   p = buf;
                   p = ustr2stp(p, "Hello ", 6);
                   p = ustr2stp(p, "world", 42);  // Padding null bytes ignored.
                   p = ustr2stp(p, "!", 1);
                   len = p - buf;
                   puts(buf);

        mempcpy(3)
                   p = buf;
                   p = mempcpy(p, "Hello ", 6);
                   p = mempcpy(p, "world", 5);
                   p = mempcpy(p, "!", 1);
                   p = '\0';
                   len = p - buf;
                   puts(buf);

    Deprecated
        strlcpy(3bsd)
        strlcat(3bsd)
                   if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf))
                       goto toolong;
                   if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf))
                       goto toolong;
                   len = strlcat(buf, "!", sizeof(buf));
                   if (len >= sizeof(buf))
                       goto toolong;
                   puts(buf);

        strscpy(9)
                   len = strscpy(buf, "Hello world!", sizeof(buf));
                   if (len == -E2BIG)
                       goto toolong;
                   puts(buf);

        strcpy(3)
        strcat(3)
                   strcpy(buf, "Hello ");
                   strcat(buf, "world");
                   strcat(buf, "!");
                   len = strlen(buf);
                   puts(buf);

        strncpy(3)
                   strncpy(buf, "Hello world!", sizeof(buf));
                   if (buf + sizeof(buf) - 1 == '\0')
                       goto toolong;
                   len = strnlen(buf, sizeof(buf));
                   for (size_t i = 0; i < sizeof(buf); i++)
                       putchar(buf[i]);

        strncat(3)
                   strncpy(buf, "Hello ", 6);
                   strncat(buf, "world", 42);  // Padding null bytes ignored.
                   strncat(buf, "!", 1);
                   puts(buf);

Oops, that example was mistaken; too much cut and paste.

       strncat(3)
                  buf[0] = '\0';
                  strncat(buf, "Hello ", 6);
                  strncat(buf, "world", 42);  // Padding null bytes ignored.
                  strncat(buf, "!", 1);
                  len = strlen(buf);
                  puts(buf);


SEE ALSO
        memcpy(3), memccpy(3), mempcpy(3), string(3)

Linux man‐pages (unreleased)        (date)                      string_copy(7)




--
<http://www.alejandro-colomar.es/>

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux