Re: [PATCH 6.6 00/28] fix CVE-2024-46701

Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> · Mon, 11 Nov 2024 08:56:50 +0800

Hi,

在 2024/11/10 0:58, Chuck Lever III 写道:

On Nov 8, 2024, at 8:30 PM, Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:

Hi,

在 2024/11/08 21:23, Chuck Lever III 写道:
On Nov 7, 2024, at 8:19 PM, Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:

Hi,

在 2024/11/07 22:41, Chuck Lever 写道:
On Thu, Nov 07, 2024 at 08:57:23AM +0800, Yu Kuai wrote:
Hi,

在 2024/11/06 23:19, Chuck Lever III 写道:

On Nov 6, 2024, at 1:16 AM, Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:

On Thu, Oct 24, 2024 at 09:19:41PM +0800, Yu Kuai wrote:
From: Yu Kuai <yukuai3@xxxxxxxxxx>

Fix patch is patch 27, relied patches are from:

I assume patch 27 is:

libfs: fix infinite directory reads for offset dir

https://lore.kernel.org/stable/20241024132225.2271667-12-yukuai1@xxxxxxxxxxxxxxx/

I don't think the Maple tree patches are a hard
requirement for this fix. And note that libfs did
not use Maple tree originally because I was told
at that time that Maple tree was not yet mature.

So, a better approach might be to fit the fix
onto linux-6.6.y while sticking with xarray.

The painful part is that using xarray is not acceptable, the offet
is just 32 bit and if it overflows, readdir will read nothing. That's
why maple_tree has to be used.
A 32-bit range should be entirely adequate for this usage.
  - The offset allocator wraps when it reaches the maximum, it
    doesn't overflow unless there are actually billions of extant
    entries in the directory, which IMO is not likely.

Yes, it's not likely, but it's possible, and not hard to trigger for
test.
I question whether such a test reflects any real-world
workload.
Besides, there are a number of other limits that will impact
the ability to create that many entries in one directory.
The number of inodes in one tmpfs instance is limited, for
instance.
And please notice that the offset will increase for each new file,
and file can be removed, while offset stays the same.

Did you see the above explanation? files can be removed, you don't have
to store that much files to trigger the offset to overflow.
  - The offset values are dense, so the directory can use all 2- or
    4- billion in the 32-bit integer range before wrapping.

A simple math, if user create and remove 1 file in each seconds, it will
cost about 130 years to overflow. And if user create and remove 1000
files in each second, it will cost about 1 month to overflow.

The problem is that if the next_offset overflows to 0, then after patch
27, offset_dir_open() will record the 0, and later offset_readdir will
return directly, while there can be many files.

Let me revisit this for a moment. The xa_alloc_cyclic() call
in simple_offset_add() has a range limit argument of 2 - U32_MAX.

So I'm not clear how an overflow (or, more precisely, the
reuse of an offset value) would result in a "0" offset being
recorded. The range limit prevents the use of 0 and 1.

A "0" offset value would be a bug, I agree, but I don't see
how that can happen.

The question is what happens when there are no more offset
values available. xa_alloc_cyclic should fail, and file
creation is supposed to fail at that point. If it doesn't,
that's a bug that is outside of the use of xarray or Maple.

Can you show me the code that xa_alloc_cyclic should fail? At least
according to the commets, it will return 1 if the allocation succeeded
after wrapping.

* Context: Any context.  Takes and releases the xa_lock.  May sleep if
* the @gfp flags permit.
* Return: 0 if the allocation succeeded without wrapping.  1 if the
* allocation succeeded after wrapping, -ENOMEM if memory could not be
* allocated or -EBUSY if there are no free entries in @limit.
*/
static inline int xa_alloc_cyclic(struct xarray *xa, u32 *id, void *entry,
struct xa_limit limit, u32 *next, gfp_t gfp)

I recall (dimly) that directory entry offset value re-use
is acceptable and preferred, so I think ignoring a "1"
return value from xa_alloc_cyclic() is OK. If there are
no unused offset values available, it will return -EBUSY,
and file creation will fail.

Perhaps Christian or Al can chime in here on whether
directory entry offset value re-use is indeed expected
to be acceptable.

This can't be acceptable in this case, the reason is straightforward,
it will mess readdir, and this is mucth more serious than the cve
itself.

Thanks,
Kuai

Further, my understanding is that:

https://lore.kernel.org/stable/20241024132225.2271667-12-yukuai1@xxxxxxxxxxxxxxx/

fixes a rename issue that results in an infinite loop,
and that's the (only) issue that underlies CVE-2024-46701.

You are suggesting that there are other overflow problems
with the xarray-based simple_offset implementation. If I
can confirm them, then I can get these fixed in v6.6. But
so far, I'm not sure I completely understand these other
failure modes.

Are you suggesting that the above fix /introduces/ the
0 offset problem?

--
Chuck Lever