Re: [PATCH v3 bpf-next 01/10] lib/buildid: harden build ID parsing logic

Jann Horn <jannh@xxxxxxxxxx> · Wed, 7 Aug 2024 17:11:27 +0200

+Matthew and fsdevel list for pagecache question

On Tue, Jul 30, 2024 at 10:39 PM Andrii Nakryiko <andrii@xxxxxxxxxx> wrote:
> Harden build ID parsing logic, adding explicit READ_ONCE() where it's
> important to have a consistent value read and validated just once.
>
> Fixes tag below points to the code that moved this code into
> lib/buildid.c, and then subsequently was used in perf subsystem, making
> this code exposed to perf_event_open() users in v5.12+.

One thing that still seems dodgy to me with this patch applied is the
call from build_id_parse() to find_get_page(), followed by reading the
page contents. My understanding of the page cache (which might be
incorrect) is that find_get_page() can return a page whose contents
have not been initialized yet, and you're supposed to check for
PageUptodate() or something like that before reading from it.

Maybe Matthew can check if I understood that right?

Also, it might be a good idea to liberally spray READ_ONCE() around
all the remaining unannotated shared memory accesses in
build_id_parse(), get_build_id_32(), get_build_id_64() and
parse_build_id_buf().

> Cc: stable@xxxxxxxxxxxxxxx
> Cc: Jann Horn <jannh@xxxxxxxxxx>
> Suggested-by: Andi Kleen <ak@xxxxxxxxxxxxxxx>
> Fixes: bd7525dacd7e ("bpf: Move stack_map_get_build_id into lib")
> Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> ---
>  lib/buildid.c | 51 +++++++++++++++++++++++++++------------------------
>  1 file changed, 27 insertions(+), 24 deletions(-)
>
> diff --git a/lib/buildid.c b/lib/buildid.c
> index e02b5507418b..d21d86f6c19a 100644
> --- a/lib/buildid.c
> +++ b/lib/buildid.c
> @@ -18,28 +18,29 @@ static int parse_build_id_buf(unsigned char *build_id,
>                               const void *note_start,
>                               Elf32_Word note_size)
>  {
> +       const char note_name[] = "GNU";
> +       const size_t note_name_sz = sizeof(note_name);
>         Elf32_Word note_offs = 0, new_offs;
> +       u32 name_sz, desc_sz;
> +       const char *data;
>
>         while (note_offs + sizeof(Elf32_Nhdr) < note_size) {
>                 Elf32_Nhdr *nhdr = (Elf32_Nhdr *)(note_start + note_offs);
>
> +               name_sz = READ_ONCE(nhdr->n_namesz);
> +               desc_sz = READ_ONCE(nhdr->n_descsz);
>                 if (nhdr->n_type == BUILD_ID &&
> -                   nhdr->n_namesz == sizeof("GNU") &&
> -                   !strcmp((char *)(nhdr + 1), "GNU") &&
> -                   nhdr->n_descsz > 0 &&
> -                   nhdr->n_descsz <= BUILD_ID_SIZE_MAX) {
> -                       memcpy(build_id,
> -                              note_start + note_offs +
> -                              ALIGN(sizeof("GNU"), 4) + sizeof(Elf32_Nhdr),
> -                              nhdr->n_descsz);
> -                       memset(build_id + nhdr->n_descsz, 0,
> -                              BUILD_ID_SIZE_MAX - nhdr->n_descsz);
> +                   name_sz == note_name_sz &&
> +                   strcmp((char *)(nhdr + 1), note_name) == 0 &&
> +                   desc_sz > 0 && desc_sz <= BUILD_ID_SIZE_MAX) {
> +                       data = note_start + note_offs + ALIGN(note_name_sz, 4);

I don't think we have any guarantee here that this addition won't
result in an OOB pointer?

> +                       memcpy(build_id, data, desc_sz);

I think this can access OOB data (because "data" can already be OOB
and because "desc_sz" hasn't been checked against the amount of
remaining space in the page).

> +                       memset(build_id + desc_sz, 0, BUILD_ID_SIZE_MAX - desc_sz);
>                         if (size)
> -                               *size = nhdr->n_descsz;
> +                               *size = desc_sz;
>                         return 0;
>                 }
> -               new_offs = note_offs + sizeof(Elf32_Nhdr) +
> -                       ALIGN(nhdr->n_namesz, 4) + ALIGN(nhdr->n_descsz, 4);
> +               new_offs = note_offs + sizeof(Elf32_Nhdr) + ALIGN(name_sz, 4) + ALIGN(desc_sz, 4);
>                 if (new_offs <= note_offs)  /* overflow */
>                         break;

You check whether "new_offs" has wrapped here, but then on the next
loop iteration, you check for "note_offs + sizeof(Elf32_Nhdr) <
note_size". So if new_offs is 0xffffffff at this point, then I think
the overflow check here will be passed, the loop condition will be
true on 32-bit kernels (on 64-bit kernels it won't be because the
addition happens on 64-bit numbers thanks to sizeof()), and "nhdr"
will point in front of the note?

>                 note_offs = new_offs;