> On Nov 8, 2024, at 8:30 PM, Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > Hi, > > 在 2024/11/08 21:23, Chuck Lever III 写道: >>> On Nov 7, 2024, at 8:19 PM, Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: >>> >>> Hi, >>> >>> 在 2024/11/07 22:41, Chuck Lever 写道: >>>> On Thu, Nov 07, 2024 at 08:57:23AM +0800, Yu Kuai wrote: >>>>> Hi, >>>>> >>>>> 在 2024/11/06 23:19, Chuck Lever III 写道: >>>>>> >>>>>> >>>>>>> On Nov 6, 2024, at 1:16 AM, Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: >>>>>>> >>>>>>> On Thu, Oct 24, 2024 at 09:19:41PM +0800, Yu Kuai wrote: >>>>>>>> From: Yu Kuai <yukuai3@xxxxxxxxxx> >>>>>>>> >>>>>>>> Fix patch is patch 27, relied patches are from: >>>>>> >>>>>> I assume patch 27 is: >>>>>> >>>>>> libfs: fix infinite directory reads for offset dir >>>>>> >>>>>> https://lore.kernel.org/stable/20241024132225.2271667-12-yukuai1@xxxxxxxxxxxxxxx/ >>>>>> >>>>>> I don't think the Maple tree patches are a hard >>>>>> requirement for this fix. And note that libfs did >>>>>> not use Maple tree originally because I was told >>>>>> at that time that Maple tree was not yet mature. >>>>>> >>>>>> So, a better approach might be to fit the fix >>>>>> onto linux-6.6.y while sticking with xarray. >>>>> >>>>> The painful part is that using xarray is not acceptable, the offet >>>>> is just 32 bit and if it overflows, readdir will read nothing. That's >>>>> why maple_tree has to be used. >>>> A 32-bit range should be entirely adequate for this usage. >>>> - The offset allocator wraps when it reaches the maximum, it >>>> doesn't overflow unless there are actually billions of extant >>>> entries in the directory, which IMO is not likely. >>> >>> Yes, it's not likely, but it's possible, and not hard to trigger for >>> test. >> I question whether such a test reflects any real-world >> workload. >> Besides, there are a number of other limits that will impact >> the ability to create that many entries in one directory. >> The number of inodes in one tmpfs instance is limited, for >> instance. >>> And please notice that the offset will increase for each new file, >>> and file can be removed, while offset stays the same. > > Did you see the above explanation? files can be removed, you don't have > to store that much files to trigger the offset to overflow. >>>> - The offset values are dense, so the directory can use all 2- or >>>> 4- billion in the 32-bit integer range before wrapping. >>> >>> A simple math, if user create and remove 1 file in each seconds, it will >>> cost about 130 years to overflow. And if user create and remove 1000 >>> files in each second, it will cost about 1 month to overflow. > The problem is that if the next_offset overflows to 0, then after patch > 27, offset_dir_open() will record the 0, and later offset_readdir will > return directly, while there can be many files. Let me revisit this for a moment. The xa_alloc_cyclic() call in simple_offset_add() has a range limit argument of 2 - U32_MAX. So I'm not clear how an overflow (or, more precisely, the reuse of an offset value) would result in a "0" offset being recorded. The range limit prevents the use of 0 and 1. A "0" offset value would be a bug, I agree, but I don't see how that can happen. >> The question is what happens when there are no more offset >> values available. xa_alloc_cyclic should fail, and file >> creation is supposed to fail at that point. If it doesn't, >> that's a bug that is outside of the use of xarray or Maple. > > Can you show me the code that xa_alloc_cyclic should fail? At least > according to the commets, it will return 1 if the allocation succeeded > after wrapping. > > * Context: Any context. Takes and releases the xa_lock. May sleep if > * the @gfp flags permit. > * Return: 0 if the allocation succeeded without wrapping. 1 if the > * allocation succeeded after wrapping, -ENOMEM if memory could not be > * allocated or -EBUSY if there are no free entries in @limit. > */ > static inline int xa_alloc_cyclic(struct xarray *xa, u32 *id, void *entry, > struct xa_limit limit, u32 *next, gfp_t gfp) I recall (dimly) that directory entry offset value re-use is acceptable and preferred, so I think ignoring a "1" return value from xa_alloc_cyclic() is OK. If there are no unused offset values available, it will return -EBUSY, and file creation will fail. Perhaps Christian or Al can chime in here on whether directory entry offset value re-use is indeed expected to be acceptable. Further, my understanding is that: https://lore.kernel.org/stable/20241024132225.2271667-12-yukuai1@xxxxxxxxxxxxxxx/ fixes a rename issue that results in an infinite loop, and that's the (only) issue that underlies CVE-2024-46701. You are suggesting that there are other overflow problems with the xarray-based simple_offset implementation. If I can confirm them, then I can get these fixed in v6.6. But so far, I'm not sure I completely understand these other failure modes. Are you suggesting that the above fix /introduces/ the 0 offset problem? -- Chuck Lever