On Fri 11-10-24 10:18:04, Baokun Li wrote: > On 2024/10/9 23:50, Jan Kara wrote: > > > Or go one step further and add a mechanism like xfs Reverse-Mapping, which > > > makes sure that allocated blocks do point to the target inode, which could > > > replace the current block_validity, and could also be used in future online > > > fscks. > > Well, that is a rather big change. It requires significant on-disk format > > change and also performance cost when to maintain. Furthermore for xattr > > blocks which can be shared by many inodes it is not even clear how to > > implement this... So I'm not sure we really want to do this either. > > Yes, there can be a lot of work involved. > > * Perhaps we could create an rmap file to store the rmap tree to avoid > on-disk format changes. > * The performance impact of maintaining rmap really needs to be evaluated, > perhaps by writing a simple DEMO to test it. > * XFS supports shared blocks(A.K.A. reflink.), so even if the physical > blocks are the same, but the inodes are different or the logical blocks > are different, they will be recorded multiple times in the tree. So the > shared xattr block can be handled similarly. > > We have plans to support online fsck in the future, and implementing rmap > is one of the steps. Perhaps one can wait until rmap is implemented to > assess whether it is worth a strict check here. Yes, we could implement something like this be as you wrote, it's going to be a lot of work. We've briefly discussed this with Ted on ext4 call and we came to a conclusion that this is a type of corruption ext4 may never protect agaist. You simply should not mount arbitrarily corrupted filesystems... But if you want to try, sure go ahead :) One relatively easy solution to similar class of problems would be to store the type of metadata buffer inside the buffer_head when we are verifying checksum, clear the info when freeing the block in __ext4_forget(), and fail with EFSCORRUPTED error when one type -> another type transition would happen. > Implementing rmap may take some time, until then we can avoid the problem > as much as possible by checking the magic and xattr block csum. > Maybe something like this? > > diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c > index 7647e9f6e190..cd3ae1e3371c 100644 > --- a/fs/ext4/xattr.c > +++ b/fs/ext4/xattr.c > @@ -1676,6 +1676,13 @@ static int ext4_xattr_set_entry(struct > ext4_xattr_info *i, > } > } > > + if (WARN_ON_ONCE(last < here)) { > + EXT4_ERROR_INODE(inode, "corrupted xattr entries in %s", > + is_block ? "block" : "ibody"); > + ret = -EFSCORRUPTED; > + goto out; > + } > + > /* Check whether we have enough space. */ > if (i->value) { > size_t free; > @@ -1923,6 +1930,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode > *inode, > } > > if (s->base) { > + struct ext4_xattr_header *hdr; > int offset = (char *)s->here - bs->bh->b_data; > > BUFFER_TRACE(bs->bh, "get_write_access"); > @@ -1932,6 +1940,16 @@ ext4_xattr_block_set(handle_t *handle, struct inode > *inode, > goto cleanup; > > lock_buffer(bs->bh); > + hdr = header(s->base); > + > + if (hdr->h_magic != cpu_to_le32(EXT4_XATTR_MAGIC) || > + (ext4_has_metadata_csum(inode->i_sb) && > + (ext4_xattr_block_csum(inode, bs->bh->b_blocknr, hdr) > != > + hdr->h_checksum))) { > + unlock_buffer(bs->bh); > + error = -EFSCORRUPTED; > + goto bad_block; > + } > > if (header(s->base)->h_refcount == cpu_to_le32(1)) { > __u32 hash = le32_to_cpu(BHDR(bs->bh)->h_hash); Hum, there are more places in xattr code that access a buffer that could have been modified. So why do you add check into this place? Is it somehow special? Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR