Re: [PATCH bpf v6 1/2] lib/strncpy_from_user.c: Don't overcopy bytes after NUL terminator

Daniel Xu <dxu@xxxxxxxxx> · Mon, 16 Nov 2020 20:53:05 -0800

Hi Linus,

On Mon, Nov 16, 2020 at 02:15:52PM -0800, Linus Torvalds wrote:
> On Mon, Nov 16, 2020 at 1:17 PM Daniel Xu <dxu@xxxxxxxxx> wrote:
> >
> > Based on on-list discussion and some off-list discussion with Alexei,
> > I'd like to propose the v4-style patch without the `(*out & ~mask)`
> > bit.
> 
> So I've verified that at least on x86-64, this doesn't really make
> code generation any worse, and I'm ok with the patch from that
> standpoint.
> 
> However, this was not what the discussion actually amended at as far
> as I'm concerned.
> 
> I mentioned that if BPF cares about the bytes past the end of the
> string, I want a *BIG COMMENT* about it. Yes, in strncpy_from_user()
> itself, but even more in the place that cares.

Sure, sorry. Will send another version with comments.

> And no, that does not mean bpf_probe_read_user_str().  That function
> clearly doesn't care at all, and doesn't access anything past the end
> of the string. I want a comment in whatever code that accesses past
> the end of the string.

If I'm reading things right, that place is technically in
kernel/bpf/hashtab.c:alloc_htab_elem. In line:

    memcpy(l_new->key, key, key_size);

where `key_size` is the width of the hashtab key. The flow looks like:

    <bpf prog code>
      bpf_map_update_elem()
        htab_map_update_elem()
          alloc_htab_elem()

It feels a bit weird to me to add a comment about strings in there
because the hash table code is string-agnostic, as mentioned in the
commit message.

> And your ABI point is actively misleading:
> 
> > We can't really zero out the rest of the buffer due to ABI issues.
> > The bpf docs state for bpf_probe_read_user_str():
> >
> > > In case the string length is smaller than *size*, the target is not
> > > padded with further NUL bytes.
> 
> This comment is actively wrong and misleading.
> 
> The code (after the patch) clearly *does* pad a bit with "further NUL
> bytes". It's just that it doesn't pad all the way to the end.

Right, it's a bit ugly and perhaps misleading. But it fixes the real
problem of keying a map with potentially random memory that
strncpy_from_user() might append on. If we pad a deterministic number of
zeros it should be ok.

> Where is the actual buffer zeroing done?

Usually the bpf prog does it. I believe (could be wrong) the verifier
enforces the key is initialized in some form.

For my specific use case, it's done in the bpftrace code:
https://github.com/iovisor/bpftrace/blob/0c88a1a4711a3d13e8c9475f7d08f83a5018fdfd/src/ast/codegen_llvm.cpp#L529-L530

> Because without the buffer zeroing, this whole patch is completely
> pointless. Which is why I want that comment, and I want a pointer to
> where that zeroing is done.
> 
> Really. You have two cases:
> 
>  (a) the buffer isn't zeroed before the strncpy_from_user()
> 
>  (b) the buffer is guaranteed to be zero before that
> 
> and in case (a), this patch is pointless, since the data after the
> string is already undefined.

See above -- I think the verifier enforces that the data is initialized.

> And in case (b), I want to see a comment and a pointer to the code
> that actually does the zeroing.

See above also.

> HOWEVER. Look at bpf_probe_read_user_str_common(), and notice how it
> ALREADY does the zeroing of the buffer past the end, it's just that it
> only does it in the error case.

I don't think the error case is very relevant here. I normally wouldn't
make any assumptions about the state of a buffer after a failed function
call.

> Why do you send this patch, instead of
> 
>  (a) get rid of the pointless pre-zeroing
> 
>  (b) change bpf_probe_read_user_str_common() to do
> 
>         int ret;
>         u32 offset;
> 
>         ret = strncpy_from_user_nofault(dst, unsafe_ptr, size);
>         offset = ret < 0 ? 0 : ret;
>         memset(dst+offset, 0, size-offset);
>         return ret;
> 
> which seems to be much simpler anyway. The comment you quote about
> "target is not padded with further NUL bytes" is clearly wrong anyway,
> since that error case *does* pad the target with NUL bytes, and always
> has.

I think on the bpf side there's the same concern about performance.
You can't really dynamically size buffers in bpf land so users usually
use a larger buffer than necessary, sometimes on the order of KBs. So if
we unnecessarily zero out the rest of the buffer it could cause perf
regressions.

[...]

Thanks,
Daniel