Re: [PATCH v5 0/9] Improve the copy of task comm

Yafang Shao <laoar.shao@xxxxxxxxx> · Thu, 8 Aug 2024 10:49:17 +0800

On Wed, Aug 7, 2024 at 1:28 AM Alejandro Colomar <alx@xxxxxxxxxx> wrote:
>
> Hi Linus,
>
> Serge let me know about this thread earlier today.
>
> On 2024-08-05, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > On Mon, 5 Aug 2024 at 20:01, Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> > >
> > > One concern about removing the BUILD_BUG_ON() is that if we extend
> > > TASK_COMM_LEN to a larger size, such as 24, the caller with a
> > > hardcoded 16-byte buffer may overflow.
> >
> > No, not at all. Because get_task_comm() - and the replacements - would
> > never use TASK_COMM_LEN.
> >
> > They'd use the size of the *destination*. That's what the code already does:
> >
> >   #define get_task_comm(buf, tsk) ({                      \
> >   ...
> >         __get_task_comm(buf, sizeof(buf), tsk);         \
> >
> > note how it uses "sizeof(buf)".
>
> In shadow.git, we also implemented macros that are named after functions
> and calculate the appropriate number of elements internally.
>
>         $ grepc -h STRNCAT .
>         #define STRNCAT(dst, src)  strncat(dst, src, NITEMS(src))
>         $ grepc -h STRNCPY .
>         #define STRNCPY(dst, src)  strncpy(dst, src, NITEMS(dst))
>         $ grepc -h STRTCPY .
>         #define STRTCPY(dst, src)  strtcpy(dst, src, NITEMS(dst))
>         $ grepc -h STRFTIME .
>         #define STRFTIME(dst, fmt, tm)  strftime(dst, NITEMS(dst), fmt, tm)
>         $ grepc -h DAY_TO_STR .
>         #define DAY_TO_STR(str, day, iso)   day_to_str(NITEMS(str), str, day, iso)
>
> They're quite useful, and when implementing them we found and fixed
> several bugs thanks to them.
>
> > Now, it might be a good idea to also verify that 'buf' is an actual
> > array, and that this code doesn't do some silly "sizeof(ptr)" thing.
>
> I decided to use NITEMS() instead of sizeof() for that reason.
> (NITEMS() is just our name for ARRAY_SIZE().)
>
>         $ grepc -h NITEMS .
>         #define NITEMS(a)            (SIZEOF_ARRAY((a)) / sizeof((a)[0]))
>
> > We do have a helper for that, so we could do something like
> >
> >    #define get_task_comm(buf, tsk) \
> >         strscpy_pad(buf, __must_be_array(buf)+sizeof(buf), (tsk)->comm)
>
> We have SIZEOF_ARRAY() for when you want the size of an array:
>
>         $ grepc -h SIZEOF_ARRAY .
>         #define SIZEOF_ARRAY(a)      (sizeof(a) + must_be_array(a))

There is already a similar macro in Linux:

  /**
   * ARRAY_SIZE - get the number of elements in array @arr
   * @arr: array to be sized
   */
  #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) +
__must_be_array(arr))

will use it instead of the sizeof().

>
> However, I don't think you want sizeof().  Let me explain why:
>
> -  Let's say you want to call wcsncpy(3) (I know nobody should be using
>    that function, not strncpy(3), but I'm using it as a standard example
>    of a wide-character string function).
>
>    You should call wcsncpy(dst, src, NITEMS(dst)).
>    A call wcsncpy(dst, src, sizeof(dst)) is bogus, since the argument is
>    the number of wide characters, not the number of bytes.
>
>    When translating that to normal characters, you want conceptually the
>    same operation, but on (normal) characters.  That is, you want
>    strncpy(dst, src, NITEMS(dst)).  While strncpy(3) with sizeof() works
>    just fine because sizeof(char)==1 by definition, it is conceptually
>    wrong to use it.
>
>    By using NITEMS() (i.e., ARRAY_SIZE()), you get the __must_be_array()
>    check for free.
>
> In the end, SIZEOF_ARRAY() is something we very rarely use.  It's there
> only used in the following two cases at the moment:
>
>         #define NITEMS(a)            (SIZEOF_ARRAY((a)) / sizeof((a)[0]))
>         #define MEMZERO(arr)  memzero(arr, SIZEOF_ARRAY(arr))
>
> Does that sound convincing?
>
> For memcpy(3) for example, you do want sizeof(), because you're copying
> raw bytes, but with strings, in which characters are conceptually
> meaningful elements, NITEMS() makes more sense.
>
> BTW, I'm working on a __lengthof__ operator that will soon allow using
> it on function parameters declared with array notation.  That is,
>
>         size_t
>         f(size_t n, int a[n])
>         {
>                 return __lengthof__(a);  // This will return n.
>         }
>
> If you're interested in it, I'm developing and discussing it here:
> <https://inbox.sourceware.org/gcc-patches/20240806122218.3827577-1-alx@xxxxxxxxxx/>
>
> >
> > as a helper macro for this all.
> >
> > (Although I'm not convinced we generally want the "_pad()" version,
> > but whatever).
>
> We had problems with it in shadow recently.  In user-space, it's similar
> to strncpy(3) (at least if you wrap it in a macro that makes sure that
> it terminates the string with a null byte).
>
> We had a lot of uses of strncpy(3), from old times where that was used
> to copy strings with truncation.  I audited all of that code (and
> haven't really finished yet), and translated to calls similar to
> strscpy(9) (we call it strtcpy(), as it _t_runcates).  The problem was
> that in some cases the padding was necessary, and in others it was not,
> and it was very hard to distinguish those.
>
> I recommend not zeroing strings unnecessarily, since that will make it
> hard to review the code later.  E.g., twenty years from now, someone
> takes a piece of code with a _pad() call, and has no clue if the zeroing
> was for a reason, or for no reason.
>
> On the other hand, not zeroing may make it easier to explot bugs, so
> whatever you think best.  In the kernel you may need to be more worried
> than in user space.  Whatever.  :)

Good point.
I will avoid using the _pad().

--
Regards
Yafang