Andreas Dilger <adilger@xxxxxxxxx> wrote: > > On Sep 20, 2024, at 9:00 AM, Jeongjun Park <aha310510@xxxxxxxxx> wrote: > > > > Currently, data-race like [1] occur in fs/ext4/ialloc.c > > > > find_group_other() and find_group_orlov() read multiple ext4_groups but > > do not protect them with locks, which causes data-race. I think it would > > be appropriate to add ext4_lock_group() at an appropriate location to solve > > this. > > > > [1] > > > > ================================================================== > > BUG: KCSAN: data-race in ext4_free_inodes_count / ext4_free_inodes_set > > > > write to 0xffff88810404300e of 2 bytes by task 6254 on cpu 1: > > ext4_free_inodes_set+0x1f/0x80 fs/ext4/super.c:405 > > __ext4_new_inode+0x15ca/0x2200 fs/ext4/ialloc.c:1216 > > ext4_symlink+0x242/0x5a0 fs/ext4/namei.c:3391 > > vfs_symlink+0xca/0x1d0 fs/namei.c:4615 > > do_symlinkat+0xe3/0x340 fs/namei.c:4641 > > __do_sys_symlinkat fs/namei.c:4657 [inline] > > __se_sys_symlinkat fs/namei.c:4654 [inline] > > __x64_sys_symlinkat+0x5e/0x70 fs/namei.c:4654 > > x64_sys_call+0x1dda/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:267 > > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > > do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 > > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > > > read to 0xffff88810404300e of 2 bytes by task 6257 on cpu 0: > > ext4_free_inodes_count+0x1c/0x80 fs/ext4/super.c:349 > > find_group_other fs/ext4/ialloc.c:594 [inline] > > __ext4_new_inode+0x6ec/0x2200 fs/ext4/ialloc.c:1017 > > ext4_symlink+0x242/0x5a0 fs/ext4/namei.c:3391 > > vfs_symlink+0xca/0x1d0 fs/namei.c:4615 > > do_symlinkat+0xe3/0x340 fs/namei.c:4641 > > __do_sys_symlinkat fs/namei.c:4657 [inline] > > __se_sys_symlinkat fs/namei.c:4654 [inline] > > __x64_sys_symlinkat+0x5e/0x70 fs/namei.c:4654 > > x64_sys_call+0x1dda/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:267 > > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > > do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 > > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > > > value changed: 0x185c -> 0x185b > > I see now after you sent this patch that all of these cases are for > read-only access to the free inodes count, which doesn't really > matter if the value is racy. These values are only used in heuristics > for block group selection, and if the value is wrong then creating a > new subdirectory may be in a different group, but that doesn't make > much difference. > > It looks like the write side of all these accesses are already under > ext4_group_lock(), so the code is actually correct and not in danger > of two threads updating bg_free_inodes_count_lo/hi inconsistently. > > We probably *do not* want locking in the read case, as it will cause > unnecessary lock contention scanning groups for subdirectory allocation. > > My suggestion at this point would be to go back to using READ_ONCE() and > WRITE_ONCE() in ext4_free_inodes_count()/ext4_free_inodes_set() like in > your original patch. but *only* for functions used by find_group_*(). We > want to be warned by KASAN if any of the other fields are accessed without > a proper ext4_group_lock(), since READ_ONCE()/WRITE_ONCE() does not fix > _lo/_hi tearing. > > It probably also makes sense to add comments to all of these functions > that they should hold ext4_group_lock() for access/updates, *except* > ext4_free_inodes_count() can be called to read the inode count without it > if the result does not need to be totally accurate. Thanks for the comment! Then it seems appropriate to annotate ext4_free_inodes_count() called in find_group_* so that KCSAN does not report it. I will write a v3 patch that implements this and send it to you right away. Regards, Jeongjun Park > > Cheers, Andreas > > > Fixes: ac27a0ec112a ("[PATCH] ext4: initial copy of files from ext3") > > Signed-off-by: Jeongjun Park <aha310510@xxxxxxxxx> > > --- > > fs/ext4/ialloc.c | 27 ++++++++++++++++++++++++--- > > 1 file changed, 24 insertions(+), 3 deletions(-) > > > > diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c > > index 9dfd768ed9f8..5cae247ff21f 100644 > > --- a/fs/ext4/ialloc.c > > +++ b/fs/ext4/ialloc.c > > @@ -500,11 +500,14 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent, > > for (i = 0; i < flex_size; i++) { > > if (grp+i >= real_ngroups) > > break; > > + ext4_lock_group(sb, grp+i); > > desc = ext4_get_group_desc(sb, grp+i, NULL); > > if (desc && ext4_free_inodes_count(sb, desc)) { > > *group = grp+i; > > + ext4_unlock_group(sb, grp+i); > > return 0; > > } > > + ext4_unlock_group(sb, grp+i); > > } > > goto fallback; > > } > > @@ -544,14 +547,17 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent, > > parent_group = EXT4_I(parent)->i_block_group; > > for (i = 0; i < ngroups; i++) { > > grp = (parent_group + i) % ngroups; > > + ext4_lock_group(sb, grp); > > desc = ext4_get_group_desc(sb, grp, NULL); > > if (desc) { > > grp_free = ext4_free_inodes_count(sb, desc); > > if (grp_free && grp_free >= avefreei) { > > *group = grp; > > + ext4_unlock_group(sb, grp); > > return 0; > > } > > } > > + ext4_unlock_group(sb, grp); > > } > > > > if (avefreei) { > > @@ -590,11 +596,14 @@ static int find_group_other(struct super_block *sb, struct inode *parent, > > if (last > ngroups) > > last = ngroups; > > for (i = parent_group; i < last; i++) { > > + ext4_lock_group(sb, i); > > desc = ext4_get_group_desc(sb, i, NULL); > > if (desc && ext4_free_inodes_count(sb, desc)) { > > *group = i; > > + ext4_unlock_group(sb, i); > > return 0; > > } > > + ext4_unlock_group(sb, i); > > } > > if (!retry && EXT4_I(parent)->i_last_alloc_group != ~0) { > > retry = 1; > > @@ -616,10 +625,14 @@ static int find_group_other(struct super_block *sb, struct inode *parent, > > * Try to place the inode in its parent directory > > */ > > *group = parent_group; > > + ext4_lock_group(sb, *group); > > desc = ext4_get_group_desc(sb, *group, NULL); > > if (desc && ext4_free_inodes_count(sb, desc) && > > - ext4_free_group_clusters(sb, desc)) > > + ext4_free_group_clusters(sb, desc)) { > > + ext4_unlock_group(sb, *group); > > return 0; > > + } > > + ext4_unlock_group(sb, *group); > > > > /* > > * We're going to place this inode in a different blockgroup from its > > @@ -640,10 +653,14 @@ static int find_group_other(struct super_block *sb, struct inode *parent, > > *group += i; > > if (*group >= ngroups) > > *group -= ngroups; > > + ext4_lock_group(sb, *group); > > desc = ext4_get_group_desc(sb, *group, NULL); > > if (desc && ext4_free_inodes_count(sb, desc) && > > - ext4_free_group_clusters(sb, desc)) > > + ext4_free_group_clusters(sb, desc)) { > > + ext4_unlock_group(sb, *group); > > return 0; > > + } > > + ext4_unlock_group(sb, *group); > > } > > > > /* > > @@ -654,9 +671,13 @@ static int find_group_other(struct super_block *sb, struct inode *parent, > > for (i = 0; i < ngroups; i++) { > > if (++*group >= ngroups) > > *group = 0; > > + ext4_lock_group(sb, *group); > > desc = ext4_get_group_desc(sb, *group, NULL); > > - if (desc && ext4_free_inodes_count(sb, desc)) > > + if (desc && ext4_free_inodes_count(sb, desc)) { > > + ext4_unlock_group(sb, *group); > > return 0; > > + } > > + ext4_unlock_group(sb, *group); > > } > > > > return -1; > > -- > > > Cheers, Andreas > > > > >