On Fri, Nov 13, 2020 at 11:17 AM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > The v4 approach preserves performance. It wasn't switching to byte_at_a_time: That v4 looks better, but still pointless. But it might be acceptable, with that final *out = (*out & ~mask) | (c & mask); just replaced with *out = c & mask; which still writes past the end, but now it only writes zeroes. But the only reason for that to be done is if you have exposed the destination buffer to another thread before (and you zeroed it before exposing it), and you want to make sure that any concurrent reader can never see anything past the end of the string. Again - the *only* case that could possibly matter is when you pre-zeroed the buffer, because if you didn't, then a concurrent reader would see random garbage *anyway*, particularly since there is no SMP memory ordering imposed with the strncpy. So nothing but "pre-zeroed" makes any possible sense, which means that the whole "(*out & ~mask)" in that v4 patch is completely and utterly meaningless. There's absolutely zero reason to try to preserve any old data. In other words, you have two cases: (a) no threaded and unlocked accesses to the resulting string (b) you _do_ have concurrent threaded accesses to the string and no locking (really? That's seriously questionable), If you have case (a), then the only correct thing to do is to explicitly pad afterwards. It's optimal, and doesn't make any assumptions about implementation of strncpy_from_user(). If you really have that case (b), and you absolutely require that the filling be done without exposing any temporary garbage, and thus the "pad afterwards" doesn't work, then you are doing something seriously odd. But in that seriously odd (b) case, the _only_ possibly valid thing you can do is to pre-zero the buffer, since strncpy_from_user() doesn't even imply any memory ordering in its accesses, so any concurrent reader by definition will see a randomly ordered partial string being copied. That strikes me as completely insane, but at least a careful reader could see a valid partial string being possibly in the process of being built up. But again, that use-case is only possible if the buffer is pre-zeroed, so doing that "(*out & ~mask)" cannot be relevant or sane. If you really do have that (b) case, then I'd accept that modified v4 patch, together with an absolutely *huge* comment both in strncpy_from_user() and very much at the call-site, talking about that non-locked concurrent access to the destination buffer. Linus