On Sat 16-07-22 11:00:46, Zhihao Cheng wrote: > 在 2022/7/12 18:54, Jan Kara 写道: > Hi Jan, one little question confuses me: > > When ext4_xattr_block_set() decides to remove xattr block the following > > race can happen: > > > > CPU1 CPU2 > > ext4_xattr_block_set() ext4_xattr_release_block() > > new_bh = ext4_xattr_block_cache_find() > > > > lock_buffer(bh); > > ref = le32_to_cpu(BHDR(bh)->h_refcount); > > if (ref == 1) { > > ... > > mb_cache_entry_delete(); > > unlock_buffer(bh); > > ext4_free_blocks(); > > ... > > ext4_forget(..., bh, ...); > > jbd2_journal_revoke(..., bh); > > > > ext4_journal_get_write_access(..., new_bh, ...) > > do_get_write_access() > > jbd2_journal_cancel_revoke(..., new_bh); > > > > Later the code in ext4_xattr_block_set() finds out the block got freed > > and cancels reusal of the block but the revoke stays canceled and so in > > case of block reuse and journal replay the filesystem can get corrupted. > > If the race works out slightly differently, we can also hit assertions > > in the jbd2 code. > > > > Fix the problem by making sure that once matching mbcache entry is > > found, code dropping the last xattr block reference (or trying to modify > > xattr block in place) waits until the mbcache entry reference is > > dropped. This way code trying to reuse xattr block is protected from > > someone trying to drop the last reference to xattr block. > > > > Reported-and-tested-by: Ritesh Harjani <ritesh.list@xxxxxxxxx> > > CC: stable@xxxxxxxxxxxxxxx > > Fixes: 82939d7999df ("ext4: convert to mbcache2") > > Signed-off-by: Jan Kara <jack@xxxxxxx> ... > > @@ -1991,18 +2020,13 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, > > lock_buffer(new_bh); > > /* > > * We have to be careful about races with > > - * freeing, rehashing or adding references to > > - * xattr block. Once we hold buffer lock xattr > > - * block's state is stable so we can check > > - * whether the block got freed / rehashed or > > - * not. Since we unhash mbcache entry under > > - * buffer lock when freeing / rehashing xattr > > - * block, checking whether entry is still > > - * hashed is reliable. Same rules hold for > > - * e_reusable handling. > > + * adding references to xattr block. Once we > > + * hold buffer lock xattr block's state is > > + * stable so we can check the additional > > + * reference fits. > > */ > > - if (hlist_bl_unhashed(&ce->e_hash_list) || > > - !ce->e_reusable) { > > + ref = le32_to_cpu(BHDR(new_bh)->h_refcount) + 1; > > + if (ref > EXT4_XATTR_REFCOUNT_MAX) { > > So far, we have mb_cache_entry_delete_or_get() and > mb_cache_entry_wait_unused(), so used cache entry cannot be concurrently > removed. Removing check 'hlist_bl_unhashed(&ce->e_hash_list)' is okay. > > What's affect of changing the other two checks 'ref >= > EXT4_XATTR_REFCOUNT_MAX' and '!ce->e_reusable'? To make code more obvious? > eg. To point out the condition of 'ref <= EXT4_XATTR_REFCOUNT_MAX' rather > than 'ce->e_reusable', we have checked 'ce->e_reusable' in > ext4_xattr_block_cache_find() before? Well, ce->e_reusable is set if and only if BHDR(new_bh)->h_refcount < EXT4_XATTR_REFCOUNT_MAX. So checking whether the refcount is small enough is all that is needed and we don't need the ce->e_reusable check here. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR