On Fri, 29 Nov 2024 at 23:15, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > And yes, we could make the word-at-a-time case also know about masking > the last word, but it's kind of annoying and depends on byte ordering. Actually, it turned out to be really trivial to do. It does depend on byte order, but not in a very complex way. Also, doing the memory accesses with READ_ONCE() might be good for clarity, but it makes gcc have conniptions and makes the code generation noticeably worse. I'm not sure why, but gcc stops doing address generation in the memory instruction for volatile accesses. I've seen that before, but completely forgot about how odd the code generation becomes. This actually generates quite good code - apart from the later 'memset()' by strscpy_pad(). Kind of sad, since the word-at-a-time code by 'strscpy()' actually handles comm[] really well (the buffer is a nice multiple of the word length), and extending it to padding would be trivial. The whole sized_strscpy_pad() macro is in fact all kinds of stupid. It does __wrote = sized_strscpy(__dst, __src, __count); if (__wrote >= 0 && __wrote < __count) and that '__wrote' name is actively misleading, and the "__wrote < __count" test is pointless. The underlying sized_strscpy() function doesn't return how many characters it wrote, it returns the length of the resulting string (or error if it truncated it), so the return value is *always* smaller than __count. That's the whole point of the function, after all. Oh well. I'll just commit my strscpy() improvement as a fix. And I'll think about how to do the "pad" version better too. Just because. Linus