Re: [PATCH bpf v2 2/2] bpf: Fix hashtab overflow check on 32-bit arches

Toke Høiland-Jørgensen <toke@xxxxxxxxxx> · Thu, 07 Mar 2024 13:00:55 +0100

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> writes:

> On Wed, Mar 6, 2024 at 2:32 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
>>
>> Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> writes:
>>
>> > On Mon, Mar 4, 2024 at 5:02 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
>> >>
>> >> Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> writes:
>> >>
>> >> > On Fri, Mar 1, 2024 at 4:35 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
>> >> >>
>> >> >> John Fastabend <john.fastabend@xxxxxxxxx> writes:
>> >> >>
>> >> >> > Alexei Starovoitov wrote:
>> >> >> >> On Thu, Feb 29, 2024 at 3:23 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
>> >> >> >> >
>> >> >> >> > The hashtab code relies on roundup_pow_of_two() to compute the number of
>> >> >> >> > hash buckets, and contains an overflow check by checking if the resulting
>> >> >> >> > value is 0. However, on 32-bit arches, the roundup code itself can overflow
>> >> >> >> > by doing a 32-bit left-shift of an unsigned long value, which is undefined
>> >> >> >> > behaviour, so it is not guaranteed to truncate neatly. This was triggered
>> >> >> >> > by syzbot on the DEVMAP_HASH type, which contains the same check, copied
>> >> >> >> > from the hashtab code. So apply the same fix to hashtab, by moving the
>> >> >> >> > overflow check to before the roundup.
>> >> >> >> >
>> >> >> >> > The hashtab code also contained a check that prevents the total allocation
>> >> >> >> > size for the buckets from overflowing a 32-bit value, but since all the
>> >> >> >> > allocation code uses u64s, this does not really seem to be necessary, so
>> >> >> >> > drop it and keep only the strict overflow check of the n_buckets variable.
>> >> >> >> >
>> >> >> >> > Fixes: daaf427c6ab3 ("bpf: fix arraymap NULL deref and missing overflow and zero size checks")
>> >> >> >> > Signed-off-by: Toke Høiland-Jørgensen <toke@xxxxxxxxxx>
>> >> >> >> > ---
>> >> >> >> >  kernel/bpf/hashtab.c | 10 +++++-----
>> >> >> >> >  1 file changed, 5 insertions(+), 5 deletions(-)
>> >> >> >> >
>> >> >> >> > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>> >> >> >> > index 03a6a2500b6a..4caf8dab18b0 100644
>> >> >> >> > --- a/kernel/bpf/hashtab.c
>> >> >> >> > +++ b/kernel/bpf/hashtab.c
>> >> >> >> > @@ -499,8 +499,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
>> >> >> >> >                                                           num_possible_cpus());
>> >> >> >> >         }
>> >> >> >> >
>> >> >> >> > -       /* hash table size must be power of 2 */
>> >> >> >> > -       htab->n_buckets = roundup_pow_of_two(htab->map.max_entries);
>> >> >> >> >
>> >> >> >> >         htab->elem_size = sizeof(struct htab_elem) +
>> >> >> >> >                           round_up(htab->map.key_size, 8);
>> >> >> >> > @@ -510,11 +508,13 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
>> >> >> >> >                 htab->elem_size += round_up(htab->map.value_size, 8);
>> >> >> >> >
>> >> >> >> >         err = -E2BIG;
>> >> >> >> > -       /* prevent zero size kmalloc and check for u32 overflow */
>> >> >> >> > -       if (htab->n_buckets == 0 ||
>> >> >> >> > -           htab->n_buckets > U32_MAX / sizeof(struct bucket))
>> >> >> >> > +       /* prevent overflow in roundup below */
>> >> >> >> > +       if (htab->map.max_entries > U32_MAX / 2 + 1)
>> >> >> >> >                 goto free_htab;
>> >> >> >>
>> >> >> >> No. We cannot artificially reduce max_entries that will break real users.
>> >> >> >> Hash table with 4B elements is not that uncommon.
>> >> >>
>> >> >> Erm, huh? The existing code has the n_buckets > U32_MAX / sizeof(struct
>> >> >> bucket) check, which limits max_entries to 134M (0x8000000). This patch
>> >> >> is *increasing* the maximum allowable size by a factor of 16 (to 2.1B or
>> >> >> 0x80000000).
>> >> >>
>> >> >> > Agree how about return E2BIG in these cases (32bit arch and overflow) and
>> >> >> > let user figure it out. That makes more sense to me.
>> >> >>
>> >> >> Isn't that exactly what this patch does? What am I missing here?
>> >> >
>> >> > I see. Then what are you fixing?
>> >> > roundup_pow_of_two() will return 0 and existing code is fine as-is.
>> >>
>> >> On 64-bit arches it will, yes. On 32-bit arches it ends up doing a
>> >> 32-bit left-shift (1UL << 32) of a 32-bit type (unsigned long), which is
>> >> UB, so there's no guarantee that it truncates down to 0. And it seems at
>> >> least on arm32 it does not: syzbot managed to trigger a crash in the
>> >> DEVMAP_HASH code by creating a map with more than 0x80000000 entries:
>> >>
>> >> https://lore.kernel.org/r/000000000000ed666a0611af6818@xxxxxxxxxx
>> >>
>> >> This patch just preemptively applies the same fix to the hashtab code,
>> >> since I could not find any reason why it shouldn't be possible to hit
>> >> the same issue there. I haven't actually managed to trigger a crash
>> >> there, though (I don't have any arm32 hardware to test this on), so in
>> >> that sense it's a bit theoretical for hashtab. So up to you if you want
>> >> to take this, but even if you don't, could you please apply the first
>> >> patch? That does fix the issue reported by syzbot (cf the
>> >> reported-and-tested-by tag).
>> >
>> > I see.
>> > Since roundup_pow_of_two() is non deterministic on 32-bit archs,
>> > let's fix them all.
>> >
>> > We have at least 5 to fix:
>> > bloom_filter.c:                 nr_bits = roundup_pow_of_two(nr_bits);
>> > devmap.c:               dtab->n_buckets =
>> > roundup_pow_of_two(dtab->map.max_entries);
>> > hashtab.c:      htab->n_buckets = roundup_pow_of_two(htab->map.max_entries);
>> > stackmap.c:     n_buckets = roundup_pow_of_two(attr->max_entries);
>> >
>> > hashtab.c:           htab->map.max_entries = roundup(attr->max_entries,
>> >                                                 num_possible_cpus());
>> >
>> > bloom_filter looks ok as-is,
>> > but stack_map has the same issue as devmap and hashtab.
>> >
>> > Let's check for
>> > if (max_entries > (1u << 31))
>> > in 3 maps and that should be enough to cover all 5 cases?
>> >
>> > imo 1u << 31 is much easier to visualize than U32_MAX/2+1
>> >
>> > and don't touch other checks.
>> > This patch is removing U32_MAX / sizeof(struct bucket) check
>> > and with that introduces overflow just few lines below in bpf_map_area_alloc.
>>
>> Are you sure there's an overflow there? I did look at that and concluded
>> that since bpf_map_area_alloc() uses a u64 for the size that it would
>> not actually overflow even with n_buckets == 1<<31. There's a check in
>> __bpf_map_area_alloc() for the size:
>>
>>         if (size >= SIZE_MAX)
>>                 return NULL;
>>
>> with
>>
>> #define SIZE_MAX        (~(size_t)0)
>>
>> in limits.h. So if sizeof(size_t) == 4, that check against SIZE_MAX
>> should trip and the allocation will just fail; but there's no overflow
>> anywhere AFAICT?
>
> There is an overflow _before_ it calls into bpf_map_area_alloc().
> Here is the line:
>         htab->buckets = bpf_map_area_alloc(htab->n_buckets *
>                                            sizeof(struct bucket),
>                                            htab->map.numa_node);
> that's why we have:
> if (htab->n_buckets > U32_MAX / sizeof(struct bucket))
> before that.

Ah, right. I was assuming that the compiler was smart enough to
implicitly convert that into the type of the function parameter before
doing the multiplication, but of course that's not the case. Thanks!

-Toke