Hi Linus, On Mon, Nov 16, 2020 at 02:15:52PM -0800, Linus Torvalds wrote: > On Mon, Nov 16, 2020 at 1:17 PM Daniel Xu <dxu@xxxxxxxxx> wrote: > > > > Based on on-list discussion and some off-list discussion with Alexei, > > I'd like to propose the v4-style patch without the `(*out & ~mask)` > > bit. > > So I've verified that at least on x86-64, this doesn't really make > code generation any worse, and I'm ok with the patch from that > standpoint. > > However, this was not what the discussion actually amended at as far > as I'm concerned. > > I mentioned that if BPF cares about the bytes past the end of the > string, I want a *BIG COMMENT* about it. Yes, in strncpy_from_user() > itself, but even more in the place that cares. Sure, sorry. Will send another version with comments. > And no, that does not mean bpf_probe_read_user_str(). That function > clearly doesn't care at all, and doesn't access anything past the end > of the string. I want a comment in whatever code that accesses past > the end of the string. If I'm reading things right, that place is technically in kernel/bpf/hashtab.c:alloc_htab_elem. In line: memcpy(l_new->key, key, key_size); where `key_size` is the width of the hashtab key. The flow looks like: <bpf prog code> bpf_map_update_elem() htab_map_update_elem() alloc_htab_elem() It feels a bit weird to me to add a comment about strings in there because the hash table code is string-agnostic, as mentioned in the commit message. > And your ABI point is actively misleading: > > > We can't really zero out the rest of the buffer due to ABI issues. > > The bpf docs state for bpf_probe_read_user_str(): > > > > > In case the string length is smaller than *size*, the target is not > > > padded with further NUL bytes. > > This comment is actively wrong and misleading. > > The code (after the patch) clearly *does* pad a bit with "further NUL > bytes". It's just that it doesn't pad all the way to the end. Right, it's a bit ugly and perhaps misleading. But it fixes the real problem of keying a map with potentially random memory that strncpy_from_user() might append on. If we pad a deterministic number of zeros it should be ok. > Where is the actual buffer zeroing done? Usually the bpf prog does it. I believe (could be wrong) the verifier enforces the key is initialized in some form. For my specific use case, it's done in the bpftrace code: https://github.com/iovisor/bpftrace/blob/0c88a1a4711a3d13e8c9475f7d08f83a5018fdfd/src/ast/codegen_llvm.cpp#L529-L530 > Because without the buffer zeroing, this whole patch is completely > pointless. Which is why I want that comment, and I want a pointer to > where that zeroing is done. > > Really. You have two cases: > > (a) the buffer isn't zeroed before the strncpy_from_user() > > (b) the buffer is guaranteed to be zero before that > > and in case (a), this patch is pointless, since the data after the > string is already undefined. See above -- I think the verifier enforces that the data is initialized. > And in case (b), I want to see a comment and a pointer to the code > that actually does the zeroing. See above also. > HOWEVER. Look at bpf_probe_read_user_str_common(), and notice how it > ALREADY does the zeroing of the buffer past the end, it's just that it > only does it in the error case. I don't think the error case is very relevant here. I normally wouldn't make any assumptions about the state of a buffer after a failed function call. > Why do you send this patch, instead of > > (a) get rid of the pointless pre-zeroing > > (b) change bpf_probe_read_user_str_common() to do > > int ret; > u32 offset; > > ret = strncpy_from_user_nofault(dst, unsafe_ptr, size); > offset = ret < 0 ? 0 : ret; > memset(dst+offset, 0, size-offset); > return ret; > > which seems to be much simpler anyway. The comment you quote about > "target is not padded with further NUL bytes" is clearly wrong anyway, > since that error case *does* pad the target with NUL bytes, and always > has. I think on the bpf side there's the same concern about performance. You can't really dynamically size buffers in bpf land so users usually use a larger buffer than necessary, sometimes on the order of KBs. So if we unnecessarily zero out the rest of the buffer it could cause perf regressions. [...] Thanks, Daniel