> On Nov 11, 2024, at 10:43 PM, yangerkun <yangerkun@xxxxxxxxxxxxxxx> wrote: > > > > 在 2024/11/11 23:34, Chuck Lever III 写道: >>> On Nov 11, 2024, at 10:20 AM, yangerkun <yangerkun@xxxxxxxxxxxxxxx> wrote: >>> >>> >>> >>> 在 2024/11/11 22:39, Chuck Lever III 写道: >>>>> On Nov 10, 2024, at 9:36 PM, Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: >>>>> >>>>> Hi, >>>>> >>>>> 在 2024/11/11 8:52, cel@xxxxxxxxxx 写道: >>>>>> From: yangerkun <yangerkun@xxxxxxxxxx> >>>>>> [ Upstream commit 64a7ce76fb901bf9f9c36cf5d681328fc0fd4b5a ] >>>>>> After we switch tmpfs dir operations from simple_dir_operations to >>>>>> simple_offset_dir_operations, every rename happened will fill new dentry >>>>>> to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free >>>>>> key starting with octx->newx_offset, and then set newx_offset equals to >>>>>> free key + 1. This will lead to infinite readdir combine with rename >>>>>> happened at the same time, which fail generic/736 in xfstests(detail show >>>>>> as below). >>>>>> 1. create 5000 files(1 2 3...) under one dir >>>>>> 2. call readdir(man 3 readdir) once, and get one entry >>>>>> 3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry) >>>>>> 4. loop 2~3, until readdir return nothing or we loop too many >>>>>> times(tmpfs break test with the second condition) >>>>>> We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite >>>>>> directory reads") to fix it, record the last_index when we open dir, and >>>>>> do not emit the entry which index >= last_index. The file->private_data >>>>> >>>>> Please notice this requires last_index should never overflow, otherwise >>>>> readdir will be messed up. >>>> It would help your cause if you could be more specific >>>> than "messed up". >>>>>> now used in offset dir can use directly to do this, and we also update >>>>>> the last_index when we llseek the dir file. >>>>>> Fixes: a2e459555c5f ("shmem: stable directory offsets") >>>>>> Signed-off-by: yangerkun <yangerkun@xxxxxxxxxx> >>>>>> Link: https://lore.kernel.org/r/20240731043835.1828697-1-yangerkun@xxxxxxxxxx >>>>>> Reviewed-by: Chuck Lever <chuck.lever@xxxxxxxxxx> >>>>>> [brauner: only update last_index after seek when offset is zero like Jan suggested] >>>>>> Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx> >>>>>> Link: https://nvd.nist.gov/vuln/detail/CVE-2024-46701 >>>>>> [ cel: adjusted to apply to origin/linux-6.6.y ] >>>>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> >>>>>> --- >>>>>> fs/libfs.c | 37 +++++++++++++++++++++++++------------ >>>>>> 1 file changed, 25 insertions(+), 12 deletions(-) >>>>>> diff --git a/fs/libfs.c b/fs/libfs.c >>>>>> index a87005c89534..b59ff0dfea1f 100644 >>>>>> --- a/fs/libfs.c >>>>>> +++ b/fs/libfs.c >>>>>> @@ -449,6 +449,14 @@ void simple_offset_destroy(struct offset_ctx *octx) >>>>>> xa_destroy(&octx->xa); >>>>>> } >>>>>> +static int offset_dir_open(struct inode *inode, struct file *file) >>>>>> +{ >>>>>> + struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode); >>>>>> + >>>>>> + file->private_data = (void *)ctx->next_offset; >>>>>> + return 0; >>>>>> +} >>>>> >>>>> Looks like xarray is still used. >>>> That's not going to change, as several folks have already >>>> explained. >>>>> I'm in the cc list ,so I assume you saw my set, then I don't know why >>>>> you're ignoring my concerns. >>>>> 1) next_offset is 32-bit and can overflow in a long-time running >>>>> machine. >>>>> 2) Once next_offset overflows, readdir will skip the files that offset >>>>> is bigger. >>> >>> I'm sorry, I'm a little busy these days, so I haven't responded to this >>> series of emails. >>> >>>> In that case, that entry won't be visible via getdents(3) >>>> until the directory is re-opened or the process does an >>>> lseek(fd, 0, SEEK_SET). >>> >>> Yes. >>> >>>> That is the proper and expected behavior. I suspect you >>>> will see exactly that behavior with ext4 and 32-bit >>>> directory offsets, for example. >>> >>> Emm... >>> >>> For this case like this: >>> >>> 1. mkdir /tmp/dir and touch /tmp/dir/file1 /tmp/dir/file2 >>> 2. open /tmp/dir with fd1 >>> 3. readdir and get /tmp/dir/file1 >>> 4. rm /tmp/dir/file2 >>> 5. touch /tmp/dir/file2 >>> 4. loop 4~5 for 2^32 times >>> 5. readdir /tmp/dir with fd1 >>> >>> For tmpfs now, we may see no /tmp/dir/file2, since the offset has been overflow, for ext4 it is ok... So we think this will be a problem. >>> >>>> Does that not directly address your concern? Or do you >>>> mean that Erkun's patch introduces a new issue? >>> >>> Yes, to be honest, my personal feeling is a problem. But for 64bit, it may never been trigger. >> Thanks for confirming. >> In that case, the preferred way to handle it is to fix >> the issue in upstream, and then backport that fix to LTS. >> Dependence on 64-bit offsets to avoid a failure case >> should be considered a workaround, not a real fix, IMHO. > > Yes. > >> Do you have a few moments to address it, or if not I >> will see to it. > > You can try to do this, for the reason I am quite busy now until end of this month... Sorry. No worries! >> I think reducing the xa_limit in simple_offset_add() to, >> say, 2..16 would make the reproducer fire almost >> immediately. > > Yes. > >>>> If there is a problem here, please construct a reproducer >>>> against this patch set and post it. >>>>> Thanks, >>>>> Kuai >>>>> >>>>>> + >>>>>> /** >>>>>> * offset_dir_llseek - Advance the read position of a directory descriptor >>>>>> * @file: an open directory whose position is to be updated >>>>>> @@ -462,6 +470,9 @@ void simple_offset_destroy(struct offset_ctx *octx) >>>>>> */ >>>>>> static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) >>>>>> { >>>>>> + struct inode *inode = file->f_inode; >>>>>> + struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode); >>>>>> + >>>>>> switch (whence) { >>>>>> case SEEK_CUR: >>>>>> offset += file->f_pos; >>>>>> @@ -475,8 +486,9 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) >>>>>> } >>>>>> /* In this case, ->private_data is protected by f_pos_lock */ >>>>>> - file->private_data = NULL; >>>>>> - return vfs_setpos(file, offset, U32_MAX); >>>>>> + if (!offset) >>>>>> + file->private_data = (void *)ctx->next_offset; >>>>>> + return vfs_setpos(file, offset, LONG_MAX); >>>>>> } >>>>>> static struct dentry *offset_find_next(struct xa_state *xas) >>>>>> @@ -505,7 +517,7 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry) >>>>>> inode->i_ino, fs_umode_to_dtype(inode->i_mode)); >>>>>> } >>>>>> -static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) >>>>>> +static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, long last_index) >>>>>> { >>>>>> struct offset_ctx *so_ctx = inode->i_op->get_offset_ctx(inode); >>>>>> XA_STATE(xas, &so_ctx->xa, ctx->pos); >>>>>> @@ -514,17 +526,21 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) >>>>>> while (true) { >>>>>> dentry = offset_find_next(&xas); >>>>>> if (!dentry) >>>>>> - return ERR_PTR(-ENOENT); >>>>>> + return; >>>>>> + >>>>>> + if (dentry2offset(dentry) >= last_index) { >>>>>> + dput(dentry); >>>>>> + return; >>>>>> + } >>>>>> if (!offset_dir_emit(ctx, dentry)) { >>>>>> dput(dentry); >>>>>> - break; >>>>>> + return; >>>>>> } >>>>>> dput(dentry); >>>>>> ctx->pos = xas.xa_index + 1; >>>>>> } >>>>>> - return NULL; >>>>>> } >>>>>> /** >>>>>> @@ -551,22 +567,19 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) >>>>>> static int offset_readdir(struct file *file, struct dir_context *ctx) >>>>>> { >>>>>> struct dentry *dir = file->f_path.dentry; >>>>>> + long last_index = (long)file->private_data; >>>>>> lockdep_assert_held(&d_inode(dir)->i_rwsem); >>>>>> if (!dir_emit_dots(file, ctx)) >>>>>> return 0; >>>>>> - /* In this case, ->private_data is protected by f_pos_lock */ >>>>>> - if (ctx->pos == DIR_OFFSET_MIN) >>>>>> - file->private_data = NULL; >>>>>> - else if (file->private_data == ERR_PTR(-ENOENT)) >>>>>> - return 0; >>>>>> - file->private_data = offset_iterate_dir(d_inode(dir), ctx); >>>>>> + offset_iterate_dir(d_inode(dir), ctx, last_index); >>>>>> return 0; >>>>>> } >>>>>> const struct file_operations simple_offset_dir_operations = { >>>>>> + .open = offset_dir_open, >>>>>> .llseek = offset_dir_llseek, >>>>>> .iterate_shared = offset_readdir, >>>>>> .read = generic_read_dir, >>>> -- >>>> Chuck Lever >> -- >> Chuck Lever -- Chuck Lever