Hi Linus, Serge let me know about this thread earlier today. On 2024-08-05, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > On Mon, 5 Aug 2024 at 20:01, Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > > > One concern about removing the BUILD_BUG_ON() is that if we extend > > TASK_COMM_LEN to a larger size, such as 24, the caller with a > > hardcoded 16-byte buffer may overflow. > > No, not at all. Because get_task_comm() - and the replacements - would > never use TASK_COMM_LEN. > > They'd use the size of the *destination*. That's what the code already does: > > #define get_task_comm(buf, tsk) ({ \ > ... > __get_task_comm(buf, sizeof(buf), tsk); \ > > note how it uses "sizeof(buf)". In shadow.git, we also implemented macros that are named after functions and calculate the appropriate number of elements internally. $ grepc -h STRNCAT . #define STRNCAT(dst, src) strncat(dst, src, NITEMS(src)) $ grepc -h STRNCPY . #define STRNCPY(dst, src) strncpy(dst, src, NITEMS(dst)) $ grepc -h STRTCPY . #define STRTCPY(dst, src) strtcpy(dst, src, NITEMS(dst)) $ grepc -h STRFTIME . #define STRFTIME(dst, fmt, tm) strftime(dst, NITEMS(dst), fmt, tm) $ grepc -h DAY_TO_STR . #define DAY_TO_STR(str, day, iso) day_to_str(NITEMS(str), str, day, iso) They're quite useful, and when implementing them we found and fixed several bugs thanks to them. > Now, it might be a good idea to also verify that 'buf' is an actual > array, and that this code doesn't do some silly "sizeof(ptr)" thing. I decided to use NITEMS() instead of sizeof() for that reason. (NITEMS() is just our name for ARRAY_SIZE().) $ grepc -h NITEMS . #define NITEMS(a) (SIZEOF_ARRAY((a)) / sizeof((a)[0])) > We do have a helper for that, so we could do something like > > #define get_task_comm(buf, tsk) \ > strscpy_pad(buf, __must_be_array(buf)+sizeof(buf), (tsk)->comm) We have SIZEOF_ARRAY() for when you want the size of an array: $ grepc -h SIZEOF_ARRAY . #define SIZEOF_ARRAY(a) (sizeof(a) + must_be_array(a)) However, I don't think you want sizeof(). Let me explain why: - Let's say you want to call wcsncpy(3) (I know nobody should be using that function, not strncpy(3), but I'm using it as a standard example of a wide-character string function). You should call wcsncpy(dst, src, NITEMS(dst)). A call wcsncpy(dst, src, sizeof(dst)) is bogus, since the argument is the number of wide characters, not the number of bytes. When translating that to normal characters, you want conceptually the same operation, but on (normal) characters. That is, you want strncpy(dst, src, NITEMS(dst)). While strncpy(3) with sizeof() works just fine because sizeof(char)==1 by definition, it is conceptually wrong to use it. By using NITEMS() (i.e., ARRAY_SIZE()), you get the __must_be_array() check for free. In the end, SIZEOF_ARRAY() is something we very rarely use. It's there only used in the following two cases at the moment: #define NITEMS(a) (SIZEOF_ARRAY((a)) / sizeof((a)[0])) #define MEMZERO(arr) memzero(arr, SIZEOF_ARRAY(arr)) Does that sound convincing? For memcpy(3) for example, you do want sizeof(), because you're copying raw bytes, but with strings, in which characters are conceptually meaningful elements, NITEMS() makes more sense. BTW, I'm working on a __lengthof__ operator that will soon allow using it on function parameters declared with array notation. That is, size_t f(size_t n, int a[n]) { return __lengthof__(a); // This will return n. } If you're interested in it, I'm developing and discussing it here: <https://inbox.sourceware.org/gcc-patches/20240806122218.3827577-1-alx@xxxxxxxxxx/> > > as a helper macro for this all. > > (Although I'm not convinced we generally want the "_pad()" version, > but whatever). We had problems with it in shadow recently. In user-space, it's similar to strncpy(3) (at least if you wrap it in a macro that makes sure that it terminates the string with a null byte). We had a lot of uses of strncpy(3), from old times where that was used to copy strings with truncation. I audited all of that code (and haven't really finished yet), and translated to calls similar to strscpy(9) (we call it strtcpy(), as it _t_runcates). The problem was that in some cases the padding was necessary, and in others it was not, and it was very hard to distinguish those. I recommend not zeroing strings unnecessarily, since that will make it hard to review the code later. E.g., twenty years from now, someone takes a piece of code with a _pad() call, and has no clue if the zeroing was for a reason, or for no reason. On the other hand, not zeroing may make it easier to explot bugs, so whatever you think best. In the kernel you may need to be more worried than in user space. Whatever. :) > > Linus Have a lovely day! Alex -- <https://www.alejandro-colomar.es/>
Attachment:
signature.asc
Description: PGP signature