Re: [PATCH v5 1/5] string_copy.7: Add page to document all string-copying functions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Formatted string_copy(7):

string_copy(7)         Miscellaneous Information Manual         string_copy(7)

NAME
       stpcpy,  strcpy,  strcat, stpecpy, stpecpyx, strlcpy, strlcat, stpncpy,
       strncpy, zustr2ustp,  zustr2stp,  strncat,  ustpcpy,  ustr2stp  -  copy
       strings and character sequences

SYNOPSIS
   Strings
       // Chain‐copy a string.
       char *stpcpy(char *restrict dst, const char *restrict src);

       // Copy/catenate a string.
       char *strcpy(char *restrict dst, const char *restrict src);
       char *strcat(char *restrict dst, const char *restrict src);

       // Chain‐copy a string with truncation.
       char *stpecpy(char *dst, char past_end[0], const char *restrict src);

       // Chain‐copy a string with truncation and SIGSEGV on UB.
       char *stpecpyx(char *dst, char past_end[0], const char *restrict src);

       // Copy/catenate a string with truncation and SIGSEGV on UB.
       size_t strlcpy(char dst[restrict .sz], const char *restrict src,
                      size_t sz);
       size_t strlcat(char dst[restrict .sz], const char *restrict src,
                      size_t sz);

   Null‐padded character sequences
       // Zero a fixed‐width buffer, and
       // copy a string into a character sequence with truncation.
       char *stpncpy(char dst[restrict .sz], const char *restrict src,
                      size_t sz);

       // Zero a fixed‐width buffer, and
       // copy a string into a character sequence with truncation.
       char *strncpy(char dest[restrict .sz], const char *restrict src,
                      size_t sz);

       // Chain‐copy a null‐padded character sequence into a character sequence.
       char *zustr2ustp(char *restrict dst, const char src[restrict .sz],
                      size_t sz);

       // Chain‐copy a null‐padded character sequence into a string.
       char *zustr2stp(char *restrict dst, const char src[restrict .sz],
                      size_t sz);

       // Catenate a null‐padded character sequence into a string.
       char *strncat(char *restrict dst, const char src[restrict .sz],
                      size_t sz);

   Measured character sequences
       // Chain‐copy a measured character sequence.
       char *ustpcpy(char *restrict dst, const char src[restrict .len],
                      size_t len);

       // Chain‐copy a measured character sequence into a string.
       char *ustr2stp(char *restrict dst, const char src[restrict .len],
                      size_t len);

DESCRIPTION
   Terms (and abbreviations)
       string (str)
              is  a sequence of zero or more non‐null characters followed by a
              null byte.

       character sequence
              is a sequence of zero or more non‐null  characters.   A  program
              should  never  usa  a  character  sequence where a string is re‐
              quired.  However, with appropriate care, a string can be used in
              the place of a character sequence.

              null‐padded character sequence (zustr)
                     Character  sequences  can  be  contained  in  fixed‐width
                     buffers, which contain padding null bytes after the char‐
                     acter  sequence,  to  fill the rest of the buffer without
                     affecting the character sequence; however, those  padding
                     null bytes are not part of the character sequence.

              measured character sequence (ustr)
                     Character  sequence delimited by its length.  It may be a
                     slice of a  larger  character  sequence,  or  even  of  a
                     string.

       length (len)
              is  the  number  of non‐null characters in a string or character
              sequence.   It  is  the  return  value  of  strlen(str)  and  of
              strnlen(ustr, sz).

       size (sz)
              refers  to  the  entire buffer where the string or character se‐
              quence is contained.

       end    is the name of a pointer to  the  terminating  null  byte  of  a
              string, or a pointer to one past the last character of a charac‐
              ter  sequence.  This is the return value of functions that allow
              chaining.  It is equivalent to &str[len].

       past_end
              is the name of a pointer to one past the end of the buffer  that
              contains  a  string  or character sequence.  It is equivalent to
              &str[sz].  It is used as a sentinel value, to be able  to  trun‐
              cate  strings  or character sequences instead of overrunning the
              containing buffer.

       copy   This term is used when the writing starts at the  first  element
              pointed to by dst.

       catenate
              This  term  is  used when a function first finds the terminating
              null byte in dst, and then starts writing at that position.

       chain  This term is used  when  it’s  the  programmer  who  provides  a
              pointer  to  the  end in dst, and the function starts writing at
              that location.  The function returns a pointer to  the  new  end
              after  the call, so that the programmer can use it to chain such
              calls.

   Copy, catenate, and chain‐copy
       Originally, there was a distinction between  functions  that  copy  and
       those that catenate.  However, newer functions that copy while allowing
       chaining  cover  both use cases with a single API.  They are also algo‐
       rithmically faster, since they don’t need to search for the end of  the
       existing  string.  However, functions that catenate have a much simpler
       use, so if performance is not important, it can make sense to use  them
       for improving readability.

       To  chain  copy  functions,  they  need to return a pointer to the end.
       That’s a byproduct of the copy operation,  so  it  has  no  performance
       costs.   Functions that return such a pointer, and thus can be chained,
       have names of the form *stp*(), since it’s  also  common  to  name  the
       pointer just p.

       Chain‐copying  functions  that  truncate should accept a pointer to one
       past the end of the destination buffer, and  have  names  of  the  form
       *stpe*().  This allows not having to recalculate the remaining size af‐
       ter each call.

   Truncate or not?
       The  first  thing  to  note  is that programmers should be careful with
       buffers, so they always have the correct size, and  truncation  is  not
       necessary.

       In  most cases, truncation is not desired, and it is simpler to just do
       the copy.  Simpler code is safer code.  Programming against programming
       mistakes by adding more code just adds more points where  mistakes  can
       be made.

       Nowadays,  compilers  can  detect  most programmer errors with features
       like compiler warnings,  static  analyzers,  and  _FORTIFY_SOURCE  (see
       ftm(7)).   Keeping  the code simple helps these overflow‐detection fea‐
       tures be more precise.

       When validating user input, however, it makes sense to  truncate.   Re‐
       member to check the return value of such function calls.

       Functions that truncate:

       •  stpecpy(3)  is the most efficient string copy function that performs
          truncation.  It only requires to check for truncation once after all
          chained calls.

       •  stpecpyx(3) is a variant of  stpecpy(3)  that  consumes  the  entire
          source  string,  to catch bugs in the program by forcing a segmenta‐
          tion fault (as strlcpy(3bsd) and strlcat(3bsd) do).

       •  strlcpy(3bsd) and strlcat(3bsd) are designed to crash if  the  input
          string is invalid (doesn’t contain a terminating null byte).

       •  stpncpy(3)  and  strncpy(3)  also  truncate,  but  they  don’t write
          strings, but rather null‐padded character sequences.

   Null‐padded character sequences
       For historic reasons, some standard APIs, such as utmpx(5),  use  null‐
       padded  character  sequences in fixed‐width buffers.  To interface with
       them, specialized functions need to be used.

       To copy strings into them, use stpncpy(3).

       To copy from an unterminated string within a fixed‐width buffer into  a
       string,  ignoring  any  trailing  null  bytes in the source fixed‐width
       buffer, you should use zustr2stp(3) or strncat(3).

       To copy from an unterminated string within a fixed‐width buffer into  a
       character  sequence,  ingoring  any  trailing  null bytes in the source
       fixed‐width buffer, you should use zustr2ustp(3).

   Measured character sequences
       The simplest character sequence copying function is mempcpy(3).  It re‐
       quires always knowing the length of your character sequences, for which
       structures can be used.  It makes the code much faster, since  you  al‐
       ways  know the length of your character sequences, and can do the mini‐
       mal copies and length measurements.  mempcpy(3)  copies  character  se‐
       quences, so you need to explicitly set the terminating null byte if you
       need a string.

       However,  for keeping type safety, it’s good to add a wrapper that uses
       char * instead of void *: ustpcpy(3).

       In programs that make considerable use  of  strings  or  character  se‐
       quences, and need the best performance, using overlapping character se‐
       quences can make a big difference.  It allows holding subsequences of a
       larger character sequence.  while not duplicating memory nor using time
       to do a copy.

       However, this is delicate, since it requires using character sequences.
       C  library  APIs  use strings, so programs that use character sequences
       will have to take care of differentiating strings  from  character  se‐
       quences.

       To copy a measured character sequence, use ustpcpy(3).

       To copy a measured character sequence into a string, use ustr2stp(3).

       Because  these  functions ask for the length, and a string is by nature
       composed of a character sequence of the same length plus a  terminating
       null byte, a string is also accepted as input.

   String vs character sequence
       Some  functions  only operate on strings.  Those require that the input
       src is a string, and guarantee an output string (even  when  truncation
       occurs).   Functions that catenate also require that dst holds a string
       before the call.  List of functions:

       •  stpcpy(3)
       •  strcpy(3), strcat(3)
       •  stpecpy(3), stpecpyx(3)
       •  strlcpy(3bsd), strlcat(3bsd)

       Other functions require an input string, but  create  a  character  se‐
       quence  as  output.   These  functions have confusing names, and have a
       long history of misuse.  List of functions:

       •  stpncpy(3)
       •  strncpy(3)

       Other functions operate on an input character sequence, and  create  an
       output  string.   Functions that catenate also require that dst holds a
       string before the call.  strncat(3) has an even  more  misleading  name
       than the functions above.  List of functions:

       •  zustr2stp(3)
       •  strncat(3)
       •  ustr2stp(3)

       Other  functions  operate  on  an input character sequence to create an
       output character sequence.  List of functions:

       •  ustpcpy(3)
       •  zustr2stp(3)

   Functions
       stpcpy(3)
              This function copies the input string into a destination string.
              The programmer is responsible  for  allocating  a  buffer  large
              enough.  It returns a pointer suitable for chaining.

       strcpy(3)
       strcat(3)
              These functions copy and catenate the input string into a desti‐
              nation  string.   The programmer is responsible for allocating a
              buffer large enough.  The return value is useless.

              stpcpy(3) is a faster alternative to these functions.

       stpecpy(3)
       stpecpyx(3)
              These functions copy the input string into a destination string.
              If the destination buffer, limited by a pointer to one past  the
              end  of  it,  isn’t large enough to hold the copy, the resulting
              string is truncated (but it  is  guaranteed  to  be  null‐termi‐
              nated).   They  return a pointer suitable for chaining.  Trunca‐
              tion needs to be detected only once after the last chained call.
              stpecpyx(3) has identical semantics to stpecpy(3),  except  that
              it forces a SIGSEGV if the src pointer is not a string.

              These  functions  are  not provided by any library; See EXAMPLES
              for a reference implementation.

       strlcpy(3bsd)
       strlcat(3bsd)
              These functions copy and catenate the input string into a desti‐
              nation string.  If the destination buffer, limited by its  size,
              isn’t  large  enough  to  hold the copy, the resulting string is
              truncated (but it is guaranteed to  be  null‐terminated).   They
              return  the  length  of  the  total string they tried to create.
              These functions force a SIGSEGV if the  src  pointer  is  not  a
              string.

              stpecpyx(3) is a faster alternative to these functions.

       stpncpy(3)
              This  function  copies the input string into a destination null‐
              padded character sequence in a fixed‐width buffer.  If the  des‐
              tination buffer, limited by its size, isn’t large enough to hold
              the  copy, the resulting character sequence is truncated.  Since
              it creates a character sequence, it doesn’t need to write a ter‐
              minating null byte.  It’s impossible to  distinguish  truncation
              after  the  call,  from  a character sequence that just fits the
              destination buffer;  truncation  should  be  detected  from  the
              length of the original string.

       strncpy(3)
              This  function is identical to stpncpy(3) except for the useless
              return value.

              stpncpy(3) is a more useful alternative to this function.

       zustr2ustp(3)
              This function copies the input character sequence contained in a
              null‐padded wixed‐width buffer, into a destination character se‐
              quence.  The programmer is responsible for allocating  a  buffer
              large enough.  It returns a pointer suitable for chaining.

              A  truncating  version of this function doesn’t exist, since the
              size of the original character sequence is always known,  so  it
              wouldn’t be very useful.

              This function is not provided by any library; See EXAMPLES for a
              reference implementation.

       zustr2stp(3)
              This function copies the input character sequence contained in a
              null‐padded  wixed‐width buffer, into a destination string.  The
              programmer is responsible for allocating a buffer large  enough.
              It returns a pointer suitable for chaining.

              A  truncating  version of this function doesn’t exist, since the
              size of the original character sequence is always known,  so  it
              wouldn’t be very useful.

              This function is not provided by any library; See EXAMPLES for a
              reference implementation.

       strncat(3)
              Do  not  confuse this function with strncpy(3); they are not re‐
              lated at all.

              This function catenates the input character  sequence  contained
              in  a null‐padded wixed‐width buffer, into a destination string.
              The programmer is responsible  for  allocating  a  buffer  large
              enough.  The return value is useless.

              zustr2stp(3) is a faster alternative to this function.

       ustpcpy(3)
              This  function  copies  the input character sequence, limited by
              its length, into a destination character sequence.  The program‐
              mer is responsible for allocating a buffer large enough.  It re‐
              turns a pointer suitable for chaining.

       ustr2stp(3)
              This function copies the input character  sequence,  limited  by
              its  length,  into  a destination string.  The programmer is re‐
              sponsible for allocating a buffer large enough.   It  returns  a
              pointer suitable for chaining.

RETURN VALUE
       The  following  functions return a pointer to the terminating null byte
       in the destination string.

       •  stpcpy(3)
       •  ustr2stp(3)
       •  zustr2stp(3)

       The following functions return a pointer to the terminating  null  byte
       in the destination string, except when truncation occurs; if truncation
       occurs,  they  return  a pointer to one past the end of the destination
       buffer (past_end).

       •  stpecpy(3), stpecpyx(3)

       The following function returns a pointer to one after the last  charac‐
       ter  in  the destination character sequence; if truncation occurs, that
       pointer is equivalent to a pointer to one past the end of the  destina‐
       tion buffer.

       •  stpncpy(3)

       The  following functions return a pointer to one after the last charac‐
       ter in the destination character sequence.

       •  zustr2ustp(3)
       •  ustpcpy(3)

       The following functions return the length of the total string that they
       tried to create (as if truncation didn’t occur).

       •  strlcpy(3bsd), strlcat(3bsd)

       The following functions return the dst pointer, which is useless.

       •  strcpy(3), strcat(3)
       •  strncpy(3)
       •  strncat(3)

NOTES
       The Linux kernel has an internal function for copying strings, which is
       similar to stpecpy(3), except that it can’t be chained:

       strscpy(9)
              This function copies the input string into a destination string.
              If the destination buffer, limited  by  its  size,  isn’t  large
              enough  to hold the copy, the resulting string is truncated (but
              it is guaranteed to be null‐terminated).  It returns the  length
              of the destination string, or -E2BIG on truncation.

              stpecpy(3) is a simpler and faster alternative to this function.

CAVEATS
       Don’t  mix  chain calls to truncating and non‐truncating functions.  It
       is conceptually wrong unless you know that the first  part  of  a  copy
       will  always  fit.  Anyway, the performance difference will probably be
       negligible, so it will probably be more clear if you use consistent se‐
       mantics: either truncating or non‐truncating.  Calling a non‐truncating
       function after a truncating one is necessarily wrong.

BUGS
       All catenation functions share the same performance  problem:  Shlemiel
       the         painter         ⟨https://www.joelonsoftware.com/2001/12/11/
       back-to-basics/⟩.

EXAMPLES
       The following are examples of correct use of each of these functions.

       stpcpy(3)
              p = buf;
              p = stpcpy(p, "Hello ");
              p = stpcpy(p, "world");
              p = stpcpy(p, "!");
              len = p - buf;
              puts(buf);

       strcpy(3)
       strcat(3)
              strcpy(buf, "Hello ");
              strcat(buf, "world");
              strcat(buf, "!");
              len = strlen(buf);
              puts(buf);

       stpecpy(3)
       stpecpyx(3)
              past_end = buf + sizeof(buf);
              p = buf;
              p = stpecpy(p, past_end, "Hello ");
              p = stpecpy(p, past_end, "world");
              p = stpecpy(p, past_end, "!");
              if (p == past_end) {
                  p--;
                  goto toolong;
              }
              len = p - buf;
              puts(buf);

       strlcpy(3bsd)
       strlcat(3bsd)
              if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf))
                  goto toolong;
              if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf))
                  goto toolong;
              len = strlcat(buf, "!", sizeof(buf));
              if (len >= sizeof(buf))
                  goto toolong;
              puts(buf);

       strscpy(9)
              len = strscpy(buf, "Hello world!", sizeof(buf));
              if (len == -E2BIG)
                  goto toolong;
              puts(buf);

       stpncpy(3)
              end = stpncpy(buf, "Hello world!", sizeof(buf));
              if (sizeof(buf) < strlen("Hello world!"))
                  goto toolong;
              len = end - buf;
              for (size_t i = 0; i < sizeof(buf); i++)
                  putchar(buf[i]);

       strncpy(3)
              strncpy(buf, "Hello world!", sizeof(buf));
              if (sizeof(buf) < strlen("Hello world!"))
                  goto toolong;
              len = strnlen(buf, sizeof(buf));
              for (size_t i = 0; i < sizeof(buf); i++)
                  putchar(buf[i]);

       zustr2ustp(3)
              p = buf;
              p = zustr2ustp(p, "Hello ", 6);
              p = zustr2ustp(p, "world", 42);  // Padding null bytes ignored.
              p = zustr2ustp(p, "!", 1);
              len = p - buf;
              printf("%.*s\n", (int) len, buf);

       zustr2stp(3)
              p = buf;
              p = zustr2stp(p, "Hello ", 6);
              p = zustr2stp(p, "world", 42);  // Padding null bytes ignored.
              p = zustr2stp(p, "!", 1);
              len = p - buf;
              puts(buf);

       strncat(3)
              buf[0] = '\0';  // There’s no ’cpy’ function to this ’cat’.
              strncat(buf, "Hello ", 6);
              strncat(buf, "world", 42);  // Padding null bytes ignored.
              strncat(buf, "!", 1);
              len = strlen(buf);
              puts(buf);

       ustpcpy(3)
              p = buf;
              p = ustpcpy(p, "Hello ", 6);
              p = ustpcpy(p, "world", 5);
              p = ustpcpy(p, "!", 1);
              len = p - buf;
              printf("%.*s\n", (int) len, buf);

       ustr2stp(3)
              p = buf;
              p = ustr2stp(p, "Hello ", 6);
              p = ustr2stp(p, "world", 5);
              p = ustr2stp(p, "!", 1);
              len = p - buf;
              puts(buf);

   Implementations
       Here are reference implementations for functions not provided by libc.

           /* This code is in the public domain. */

           char *
           stpecpy(char *dst, char past_end[0], const char *restrict src)
           {
               char *p;

               if (dst == past_end)
                   return past_end;

               p = memccpy(dst, src, '\0', past_end - dst);
               if (p != NULL)
                   return p - 1;

               /* truncation detected */
               past_end[-1] = '\0';
               return past_end;
           }

           char *
           stpecpyx(char *dst, char past_end[0], const char *restrict src)
           {
               if (src[strlen(src)] != '\0')
                   raise(SIGSEGV);

               return stpecpy(dst, past_end, src);
           }

           char *
           zustr2ustp(char *restrict dst, const char *restrict src, size_t sz)
           {
               return ustpcpy(dst, src, strnlen(src, sz));
           }

           char *
           zustr2stp(char *restrict dst, const char *restrict src, size_t sz)
           {
               char  *end;

               end = zustr2ustp(dst, src, sz);
               *end = '\0';

               return end;
           }

           char *
           ustpcpy(char *restrict dst, const char *restrict src, size_t len)
           {
               return mempcpy(dst, src, len);
           }

           char *
           ustr2stp(char *restrict dst, const char *restrict src, size_t len)
           {
               char  *end;

               end = ustpcpy(dst, src, len);
               *end = '\0';

               return end;
           }

SEE ALSO
       bzero(3), memcpy(3), memccpy(3), mempcpy(3), stpcpy(3),  strlcpy(3bsd),
       strncat(3), strpcpy(3), string(3)

Linux man‐pages (unreleased)        (date)                      string_copy(7)


--
<http://www.alejandro-colomar.es/>

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux