Re: [PATCH] fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Sat, 3 Aug 2024 16:51:44 -0700

On Sat, 3 Aug 2024 at 15:50, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>
> +               unsigned int n = open_files / BITS_PER_LONG;
> +               if (n % BITS_PER_LONG) {
> +                       unsigned long mask = BITMAP_LAST_WORD_MASK(n);
> +                       new_fdt->full_fds_bits[n / BITS_PER_LONG] &= mask;
> +               }

Ouch. I hate this.

And yes, clearly this fixes a bug, and clearly we don't maintain the
full_fds_bits[] array properly, but I still hate it.

The fact that we had this bug in the first place shows that our
full_fds_bits code is too subtle, and this makes it worse.

I really would prefer to do this in copy_fd_bitmaps() instead.

Also, your comment says

                 ... nothing of that sort happens to open_fds and
                * close_on_exec, since there the part that needs to be copied
                * will always fall on a word boundary.

but it doesn't really explain *why* those things are on a word boundary.

So I think part of the commentary should be that we always make sure
that the normal bitmaps are full words (pointing out the

       ALIGN(min(count, max_fds), BITS_PER_LONG)

in sane_fdtable_size), and then explaining that the reason the
'full_fds_bits' bitmap is special is that it is not a bitmap of file
descriptors, it's a bitmap of 'words of file descriptor bitmaps'.

So yes, the full_fds_bits bitmap is very special indeed, and yes, we
got this all wrong, but I feel like the bug is in copy_fd_bitmaps(),
and the fixup should be there too...

In fact, I think the full_fds_bits copy in copy_fd_bitmaps() should
just be entirely rewritten. Even the initial copy shouldn't be done
with some byte-wise memcpy/memset, since those are all 'unsigned long'
arrays, and the size is aligned. So it should be done on words, but
whatever.

And the full_fds_bits case should use our actual bitmap functions.

So instead of

        cpy = BITBIT_SIZE(count);
        set = BITBIT_SIZE(nfdt->max_fds) - cpy;
        memcpy(nfdt->full_fds_bits, ofdt->full_fds_bits, cpy);
        memset((char *)nfdt->full_fds_bits + cpy, 0, set);

I think the code could (and should) do just

        cpy = count / BITS_PER_LONG;
        max = nfdt->max_fds / BITS_PER_LONG;

        bitmap_copy(nfdt->full_fds_bits, ofdt->full_fds_bits, cpy);
        bitmap_clear(nfdt->full_fds_bits, count, max - cpy);

or something like that.

Doesn't that seem more straightforward?

Of course, it's also entirely possible that I have just confused
myself, and the above is buggy garbage. I tried to think this through,
but I really tried to think more along the lines of "how can we make
this legible so that the code is actually understandable". Because I
think your patch almost certainly fixes the bug, but it sure isn't
easy to understand.

           Linus