[PATCH v3 0/1] Rewritten page for string-copying functions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

I've written a new manual page for documenting string-copying functions
so that it's clear what's the purpose of each of them.  It may differ
from the original design of the functions, since my guess for several of
them is simply that they were misdesigned.  However, after investigating
the operation that they perform on bytes, I've come up with a story that
can make sense of functions that were once believed to be broken by
many.  In fact, my conclusion after writing the page is that only one
function is really useless:

-  strncpy(3):  stpncpy(3) is _always_ better.

The others depend on the program.  If you don't care at all about
performance and Shlemiel is a friend of yours, then rcpy and [rn]cat
are your friends.  If you don't like Shlemiel, and don't mind slightly
more complex code, you'll go for 'p' functions.

And so on.  I won't spoil the page more.

Basically I want to end with this situation where a function like
strncpy(3) is dreaded by some because it looks broken (myself thought
that for a long time), and other who don't even know it misuse it for
what it shouldn't be useful, which is even worse.  Or where programmers
think that strncpy(3) and strncat(3) have any relationship at all (they
don't).

Below goes the formatted page.  Please review independently of it being
in strcpy(3) or string_copy(7), and address that as a separate issue
(but of course feel free to cover it, and any other issues).


Cheers,

Alex


Alejandro Colomar (1):
  strcpy.3: Rewrite page to document all string-copying functions

 man3/strcpy.3 | 1058 +++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 970 insertions(+), 88 deletions(-)


strcpy(3)                  Library Functions Manual                  strcpy(3)

NAME
       stpcpy,  strcpy,  strcat, stpecpy, stpecpyx, strlcpy, strlcat, strscpy,
       stpncpy, strncpy, ustr2stp, strncat, mempcpy - copy strings and charac‐
       ter sequences

LIBRARY
       stpcpy(3)
       strcpy(3), strcat(3)
       stpncpy(3)
       strncpy(3)
       strncat(3)
       mempcpy(3)
              Standard C library (libc, -lc)

       stpecpy(3), stpecpyx(3)
              Not provided by any library.

       strlcpy(3), strlcat(3)
              Utility functions from BSD systems (libbsd, -lbsd)

       strscpy(3)
              Not provided by any library.  It  is  a  Linux  kernel  internal
              function.

SYNOPSIS
       #include <string.h>

   Strings
       // Chain‐copy a string.
       char *stpcpy(char *restrict dst, const char *restrict src);

       // Copy/concatenate a string.
       char *strcpy(char *restrict dst, const char *restrict src);
       char *strcat(char *restrict dst, const char *restrict src);

       // Chain‐copy a string with truncation.
       char *stpecpy(char *dst, char past_end[0], const char *restrict src);

       // Chain‐copy a string with truncation and SIGSEGV on UB.
       char *stpecpyx(char *dst, char past_end[0], const char *restrict src);

       // Copy/concatenate a string with truncation and SIGSEGV on UB.
       size_t strlcpy(char dst[restrict .sz], const char *restrict src,
                      size_t sz);
       size_t strlcat(char dst[restrict .sz], const char *restrict src,
                      size_t sz);

       // Copy a string with truncation.
       ssize_t strscpy(char dst[restrict .sz], const char src[restrict .sz],
                      size_t sz);

   Null‐padded character sequences
       // Zero a fixed‐width buffer, and
       // copy a string with truncation into a character sequence.
       char *stpncpy(char dst[restrict .sz], const char *restrict src,
                      size_t sz);

       // Zero a fixed‐width buffer, and
       // copy a string with truncation into a character sequence.
       char *strncpy(char dest[restrict .sz], const char *restrict src,
                      size_t sz);

       // Chain‐copy a null‐padded character sequence into a string.
       char *ustr2stp(char *restrict dst, const char src[restrict .sz],
                      size_t sz);

       // Concatenate a null‐padded character sequence into a string.
       char *strncat(char *restrict dst, const char src[restrict .sz],
                      size_t sz);

   Measured character sequences
       // Chain‐copy a measured character sequence.
       void *mempcpy(void *restrict dst, const void src[restrict .len],
                      size_t len);

   Feature Test Macro Requirements for glibc (see feature_test_macros(7)):

       stpcpy(3), stpncpy(3):
           Since glibc 2.10:
               _POSIX_C_SOURCE >= 200809L
           Before glibc 2.10:
               _GNU_SOURCE

       mempcpy(3):
           _GNU_SOURCE

DESCRIPTION
   Terms (and abbreviations)
       string (str)
              is  a sequence of zero or more non‐null characters followed by a
              null byte.

       character sequence (ustr)
              is a sequence of zero or more non‐null  characters.   A  program
              should  never  usa  a  character  sequence where a string is re‐
              quired.  However, with appropriate care, a string can be used in
              the place of a character sequence.

              null‐padded character sequence
                     Character  sequences  can  be  contained  in  fixed‐width
                     buffers, which contain padding null bytes after the char‐
                     acter  sequence,  to  fill the rest of the buffer without
                     affecting the character sequence; however, those  padding
                     null bytes are not part of the character sequence.

              measured character sequence
                     Character sequence delimited by its length.

       length (len)
              is  the  number  of non‐null characters in a string or character
              sequence.   It  is  the  return  value  of  strlen(str)  and  of
              strnlen(ustr, sz).

       size (sz)
              refers  to  the  entire buffer where the string or character se‐
              quence is contained.

       end    is the name of a pointer to  the  terminating  null  byte  of  a
              string, or a pointer to one past the last character of a charac‐
              ter  sequence.  This is the return value of functions that allow
              chaining.  It is equivalent to &str[len].

       past_end
              is the name of a pointer to one past the end of the buffer  that
              contains  a  string  or character sequence.  It is equivalent to
              &str[sz].  It is used as a sentinel value, to be able  to  trun‐
              cate  strings  or character sequences instead of overrunning the
              containing buffer.

   Copy, concatenate, and chain‐copy
       Originally, there was a distinction between  functions  that  copy  and
       those  that  concatenate.  However, newer functions that copy while al‐
       lowing chaining cover both use cases with a single API.  They are  also
       algorithmically  faster, since they don’t need to search for the end of
       the existing string.  However, functions that concatenate have  a  much
       simpler  use,  so if performance is not important, it can make sense to
       use them for improving readability.

       To chain copy functions, they need to return  a  pointer  to  the  end.
       That’s  a  byproduct  of  the  copy operation, so it has no performance
       costs.  Functions that return such a pointer, and thus can be  chained,
       have  names  of the form *stp*() or *memp*(), since it’s also common to
       name the pointer just p.

       Chain‐copying functions that truncate should accept a  pointer  to  one
       past  the  end  of  the  destination buffer, and have names of the form
       *stpe*().  This allows not having to recalculate the remaining size af‐
       ter each call.

   Truncate or not?
       The first thing to note is that  programmers  should  be  careful  with
       buffers,  so  they  always have the correct size, and truncation is not
       necessary.

       In most cases, truncation is not desired, and it is simpler to just  do
       the copy.  Simpler code is safer code.  Programming against programming
       mistakes  by  adding more code just adds more points where mistakes can
       be made.

       Nowadays, compilers can detect most  programmer  errors  with  features
       like  compiler  warnings,  static  analyzers,  and _FORTIFY_SOURCE (see
       ftm(7)).  Keeping the code simple helps these  overflow‐detection  fea‐
       tures be more precise.

       When  validating  user input, however, it makes sense to truncate.  Re‐
       member to check the return value of such function calls.

       Functions that truncate:

       •  stpecpy(3) is the most efficient string copy function that  performs
          truncation.  It only requires to check for truncation once after all
          chained calls.

       •  stpecpyx(3)  is  a  variant  of  stpecpy(3) that consumes the entire
          source string, to catch bugs in the program by forcing  a  segmenta‐
          tion fault (as strlcpy(3bsd) and strlcat(3bsd) do).

       •  strlcpy(3bsd)  and  strlcat(3bsd) are designed to crash if the input
          string is invalid (doesn’t contain a terminating null byte).

       •  strscpy(3)  reports  an  error  instead  of  crashing  (similar   to
          stpecpy(3)).

       •  stpncpy(3)  and  strncpy(3)  also  truncate,  but  they  don’t write
          strings, but rather null‐padded character sequences.

   Null‐padded character sequences
       For historic reasons, some standard APIs, such as utmpx(5),  use  null‐
       padded  character  sequences in fixed‐width buffers.  To interface with
       them, specialized functions need to be used.

       To copy strings into them, use stpncpy(3).

       To copy from an unterminated string within a fixed‐width buffer into  a
       string,  ignoring  any  trailing  null  bytes in the source fixed‐width
       buffer, you should use ustr2stp(3) or strncat(3).

   Measured character sequences
       The simplest character sequence copying function is mempcpy(3).  It re‐
       quires always knowing the length of your character sequences, for which
       structures can be used.  It makes the code much faster, since  you  al‐
       ways  know the length of your character sequences, and can do the mini‐
       mal copies and length measurements.  mempcpy(3)  copies  character  se‐
       quences, so you need to explicitly set the terminating null byte if you
       need a string.

       The  following code can be used to chain‐copy from a measured character
       sequence into a string:

           p = mempcpy(p, foo->ustr, foo->len);
           *p = '\0';

       The following code can be used to chain‐copy from a measured  character
       sequence into an unterminated string:

           p = mempcpy(p, bar->ustr, bar->len);

       In  programs  that  make  considerable  use of strings or character se‐
       quences, and need the best performance, using overlapping character se‐
       quences can make a big difference.  It allows holding subsequences of a
       larger character sequence.  while not duplicating memory nor using time
       to do a copy.

       However, this is delicate, since it requires using character sequences.
       C library APIs use strings, so programs that  use  character  sequences
       will  have  to  take care of differentiating strings from character se‐
       quences.

   String vs character sequence
       Some functions only operate on strings.  Those require that  the  input
       src  is  a string, and guarantee an output string (even when truncation
       occurs).  Functions that concatenate also  require  that  dst  holds  a
       string before the call.  List of functions:

       •  stpcpy(3)
       •  strcpy(3), strcat(3)
       •  stpecpy(3), stpecpyx(3)
       •  strlcpy(3bsd), strlcat(3bsd)
       •  strscpy(3)

       Other  functions  require  an  input string, but create a character se‐
       quence as output.  These functions have confusing  names,  and  have  a
       long history of misuse.  List of functions:

       •  stpncpy(3)
       •  strncpy(3)

       Other  functions  operate on an input character sequence, and create an
       output string.  Functions that concatenate also require that dst  holds
       a  string before the call.  strncat(3) has an even more misleading name
       than the functions above.  List of functions:

       •  ustr2stp(3)
       •  strncat(3)

       And the last one, operates on an input character sequence to create  an
       output  character  sequence.  But because it asks for the length, and a
       string is by nature composed of a character sequence of the same length
       plus a terminating null byte, a  string  is  also  accepted  as  input.
       Function:

       •  mempcpy(3)

   Functions
       stpcpy(3)
              This function copies the input string into a destination string.
              The  programmer  is  responsible  for  allocating a buffer large
              enough.  It returns a pointer suitable for chaining.

              An implementation of this function might be:

                  char *
                  stpcpy(char *restrict dst, const char *restrict src)
                  {
                      return mempcpy(dst, src, strlen(src));
                  }

       strcpy(3)
       strcat(3)
              These functions copy the input string into a destination string.
              The programmer is responsible  for  allocating  a  buffer  large
              enough.  The return value is useless.

              stpcpy(3) is a faster alternative to these functions.

              An implementation of these functions might be:

                  char *
                  strcpy(char *restrict dst, const char *restrict src)
                  {
                      stpcpy(dst, src);
                      return dst;
                  }

                  char *
                  strcat(char *restrict dst, const char *restrict src)
                  {
                      stpcpy(dst + strlen(dst), src);
                      return dst;
                  }

       stpecpy(3)
       stpecpyx(3)
              These functions copy the input string into a destination string.
              If  the destination buffer, limited by a pointer to one past the
              end of it, isn’t large enough to hold the  copy,  the  resulting
              string  is  truncated  (but  it  is guaranteed to be null‐termi‐
              nated).  They return a pointer suitable for  chaining.   Trunca‐
              tion needs to be detected only once after the last chained call.
              stpecpyx(3)  has  identical semantics to stpecpy(3), except that
              it forces a SIGSEGV if the src pointer is not a string.

              These functions are not provided by any library, but you can de‐
              fine them with the following reference implementations:

                  /* This code is in the public domain. */
                  char *
                  stpecpy(char *dst, char past_end[0],
                          const char *restrict src)
                  {
                      char *p;

                      if (dst == past_end)
                          return past_end;

                      p = memccpy(dst, src, '\0', past_end - dst);
                      if (p != NULL)
                          return p - 1;

                      /* truncation detected */
                      past_end[-1] = '\0';
                      return past_end;
                  }

                  /* This code is in the public domain. */
                  char *
                  stpecpyx(char *dst, char past_end[0],
                           const char *restrict src)
                  {
                      if (src[strlen(src)] != '\0')
                          raise(SIGSEGV);

                      return stpecpy(dst, past_end, src);
                  }

       strlcpy(3bsd)
       strlcat(3bsd)
              These functions copy the input string into a destination string.
              If the destination buffer, limited  by  its  size,  isn’t  large
              enough  to hold the copy, the resulting string is truncated (but
              it is guaranteed to be null‐terminated).  They return the length
              of the total string they tried to create.  These functions force
              a SIGSEGV if the src pointer is not a string.

              stpecpyx(3) is a faster alternative to these functions.

       strscpy(3)
              This function copies the input string into a destination string.
              If the destination buffer, limited  by  its  size,  isn’t  large
              enough  to hold the copy, the resulting string is truncated (but
              it is guaranteed to be null‐terminated).  It returns the  length
              of the destination string, or -E2BIG on truncation.

              stpecpy(3) is a simpler and faster alternative to this function.

       stpncpy(3)
              This  function  copies the input string into a destination null‐
              padded character sequence in a fixed‐width buffer.  If the  des‐
              tination buffer, limited by its size, isn’t large enough to hold
              the  copy, the resulting character sequence is truncated.  Since
              it creates a character sequence, it doesn’t need to write a ter‐
              minating null byte.  It returns a pointer suitable for chaining,
              but it’s not ideal for that.  Truncation needs  to  be  detected
              only once after the last chained call.

              If  you’re going to use this function in chained calls, it would
              be useful to develop a similar function that accepts  a  pointer
              to one past the end of the buffer instead of a size.

              An implementation of this function might be:

                  char *
                  stpncpy(char *restrict dst, const char *restrict src,
                          size_t sz)
                  {
                      char  *p;

                      bzero(dst, sz);
                      p = memccpy(dst, src, '\0', sz);
                      if (p == NULL)
                          return dst + sz;

                      return p - 1;
                  }

       ustr2stp(3)
              This function copies the input character sequence contained in a
              null‐padded  wixed‐width buffer, into a destination string.  The
              programmer is responsible for allocating a buffer large  enough.
              It returns a pointer suitable for chaining.

              A  truncating  version of this function doesn’t exist, since the
              size of the original character sequence is always known,  so  it
              wouldn’t be very useful.

              This function is not provided by any library, but you can define
              it with the following reference implementation:

                  /* This code is in the public domain. */
                  char *
                  ustr2stp(char *restrict dst, const char *restrict src,
                           size_t sz)
                  {
                      char  *end;

                      end = memccpy(dst, src, '\0', sz)) ?: dst + sz;
                      *end = '\0';

                      return end;
                  }

       strncpy(3)
              This  function is identical to stpncpy(3) except for the useless
              return value.  Due to the return value, with this function  it’s
              hard to correctly check for truncation.

              stpncpy(3) is a simpler alternative to this function.

              An implementation of this function might be:

                  char *
                  strncpy(char *restrict dst, const char *restrict src,
                          size_t sz)
                  {
                      stpncpy(dst, src, sz);
                      return dst;
                  }

       strncat(3)
              Do  not  confuse this function with strncpy(3); they are not re‐
              lated at all.

              This function concatenates the  input  character  sequence  con‐
              tained  in  a null‐padded wixed‐width buffer, into a destination
              string.  The programmer is responsible for allocating  a  buffer
              large enough.  The return value is useless.

              ustr2stp(3) is a faster alternative to this function.

              An implementation of this function might be:

                  char *
                  strncat(char *restrict dst, const char *restrict src,
                          size_t sz)
                  {
                      ustr2stp(dst + strlen(dst), src, sz);
                      return dst;
                  }

       mempcpy(3)
              This  function  copies  the input character sequence, limited by
              its length, into a destination character sequence.  The program‐
              mer is responsible for allocating a buffer large enough.  It re‐
              turns a pointer suitable for chaining.

              An implementation of this function might be:

                  void *
                  mempcpy(void *restrict dst, const void *restrict src,
                          size_t len)
                  {
                      return memcpy(dst, src, len) + len;
                  }

RETURN VALUE
       The following functions return a pointer to the terminating  null  byte
       in the destination string.

       •  stpcpy(3)
       •  ustr2stp(3)

       The  following  functions return a pointer to the terminating null byte
       in the destination string, except when truncation occurs; if truncation
       occurs, they return a pointer to one past the end  of  the  destination
       buffer (past_end).

       •  stpecpy(3), stpecpyx(3)

       The  following function returns a pointer to one after the last charac‐
       ter in the destination character sequence; if truncation  occurs,  that
       pointer  is equivalent to a pointer to one past the end of the destina‐
       tion buffer.

       •  stpncpy(3)

       The following function returns a pointer to one after the last  charac‐
       ter in the destination character sequence.

       •  mempcpy(3)

       The following functions return the length of the total string that they
       tried to create (as if truncation didn’t occur).

       •  strlcpy(3bsd), strlcat(3bsd)

       The following function returns the length of the destination string, or
       -E2BIG on truncation.

       •  strscpy(3)

       The following functions return the dst pointer, which is useless.

       •  strcpy(3), strcat(3)
       •  strncpy(3)
       •  strncat(3)

ATTRIBUTES
       For  an  explanation  of  the  terms  used in this section, see attrib‐
       utes(7).
       ┌────────────────────────────────────────────┬───────────────┬─────────┐
       │Interface                                   │ Attribute     │ Value   │
       ├────────────────────────────────────────────┼───────────────┼─────────┤
       │stpcpy(), strcpy(), strcat(), stpecpy(),    │ Thread safety │ MT‐Safe │
       │stpecpyx() strlcpy(), strlcat(), strscpy(), │               │         │
       │stpncpy(), strncpy(), ustr2stp(),           │               │         │
       │strncat(), mempcpy()                        │               │         │
       └────────────────────────────────────────────┴───────────────┴─────────┘

STANDARDS
       strcpy(3), strcat(3)
       strncpy(3)
       strncat(3)
              POSIX.1‐2001, POSIX.1‐2008, C89, C99, SVr4, 4.3BSD.

       stpcpy(3)
       stpncpy(3)
              POSIX.1‐2008.

       strlcpy(3bsd), strlcat(3bsd)
              Functions originated in OpenBSD and present in  some  Unix  sys‐
              tems.

       mempcpy(3)
              This function is a GNU extension.

       strscpy(3)
              Linux kernel internal function.

       stpecpy(3), stpecpyx(3)
       ustr2stp(3)
              Not defined by any standards nor libraries.

CAVEATS
       Don’t  mix  chain calls to truncating and non‐truncating functions.  It
       is conceptually wrong unless you know that the first  part  of  a  copy
       will  always  fit.  Anyway, the performance difference will probably be
       negligible, so it will probably be more clear if you use consistent se‐
       mantics: either truncating or non‐truncating.  Calling a non‐truncating
       function after a truncating one is necessarily wrong.

       Some of the functions described here are not provided by  any  library;
       you should write your own copy if you want to use them.  See STANDARDS.

BUGS
       All  concatenation  (*cat()) functions share the same performance prob‐
       lem: Shlemiel the  painter  ⟨https://www.joelonsoftware.com/2001/12/11/
       back-to-basics/⟩.

EXAMPLES
       The following are examples of correct use of each of these functions.

       stpcpy(3)
                  p = buf;
                  p = stpcpy(p, "Hello ");
                  p = stpcpy(p, "world");
                  p = stpcpy(p, "!");
                  len = p - buf;
                  puts(buf);

       strcpy(3)
       strcat(3)
                  strcpy(buf, "Hello ");
                  strcat(buf, "world");
                  strcat(buf, "!");
                  len = strlen(buf);
                  puts(buf);

       stpecpy(3)
       stpecpyx(3)
                  past_end = buf + sizeof(buf);
                  p = buf;
                  p = stpecpy(p, past_end, "Hello ");
                  p = stpecpy(p, past_end, "world");
                  p = stpecpy(p, past_end, "!");
                  if (p == past_end) {
                      p--;
                      goto toolong;
                  }
                  len = p - buf;
                  puts(buf);

       strlcpy(3bsd)
       strlcat(3bsd)
                  if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf))
                      goto toolong;
                  if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf))
                      goto toolong;
                  len = strlcat(buf, "!", sizeof(buf));
                  if (len >= sizeof(buf))
                      goto toolong;
                  puts(buf);

       strscpy(3)
                  len = strscpy(buf, "Hello world!", sizeof(buf));
                  if (len == -E2BIG)
                      goto toolong;
                  puts(buf);

       stpncpy(3)
                  past_end = buf + sizeof(buf);
                  end = stpncpy(buf, "Hello world!", sizeof(buf));
                  if (end == past_end)
                      goto toolong;
                  len = end - buf;
                  for (size_t i = 0; i < sizeof(buf); i++)
                      putchar(buf[i]);

       strncpy(3)
                  strncpy(buf, "Hello world!", sizeof(buf));
                  if (buf + sizeof(buf) - 1 == '\0')
                      goto toolong;
                  len = strnlen(buf, sizeof(buf));
                  for (size_t i = 0; i < sizeof(buf); i++)
                      putchar(buf[i]);

       ustr2stp(3)
                  p = buf;
                  p = ustr2stp(p, "Hello ", 6);
                  p = ustr2stp(p, "world", 42);  // Padding null bytes ignored.
                  p = ustr2stp(p, "!", 1);
                  len = p - buf;
                  puts(buf);

       strncat(3)
                  buf[0] = '\0';  // There’s no ’cpy’ function to this ’cat’.
                  strncat(buf, "Hello ", 6);
                  strncat(buf, "world", 42);  // Padding null bytes ignored.
                  strncat(buf, "!", 1);
                  len = strlen(buf);
                  puts(buf);

       mempcpy(3)
                  p = buf;
                  p = mempcpy(p, "Hello ", 6);
                  p = mempcpy(p, "world", 5);
                  p = mempcpy(p, "!", 1);
                  p = '\0';
                  len = p - buf;
                  puts(buf);

SEE ALSO
       bzero(3), memcpy(3), memccpy(3), mempcpy(3), string(3)

Linux man‐pages (unreleased)        (date)                           strcpy(3)

-- 
2.38.1




[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux