On 2022/3/21 23:11, Jan Kara wrote: > Hello Yi! > > On Mon 21-03-22 22:38:49, Zhang Yi wrote: >> On 2022/3/21 19:37, Jan Kara wrote: >>> On Mon 21-03-22 19:34:08, Ye Bin wrote: >>>> We got issue as follows: >>>> [home]# fsck.ext4 -fn ram0yb >>>> e2fsck 1.45.6 (20-Mar-2020) >>>> Pass 1: Checking inodes, blocks, and sizes >>>> Pass 2: Checking directory structure >>>> Symlink /p3/d14/d1a/l3d (inode #3494) is invalid. >>>> Clear? no >>>> Entry 'l3d' in /p3/d14/d1a (3383) has an incorrect filetype (was 7, should be 0). >>>> Fix? no >>>> >>>> As symlink file size not match to file content. If symlink data block >>>> writback failed, will call ext4_finish_bio to end io. In this path don't >>>> mark buffer error. When umount do checkpoint can't detect buffer error, >>>> then will cleanup jounral. Actually, correct data maybe in journal area. >>>> To solve this issue, mark buffer error when detect bio error in >>>> ext4_finish_bio. >>> >>> Thanks for the patch! Let me rephrase the text a bit: >>> >>> As the symlink file size does not match the file content. If the writeback >>> of the symlink data block failed, ext4_finish_bio() handles the end of IO. >>> However this function fails to mark the buffer with BH_write_io_error and >>> so when unmount does journal checkpoint it cannot detect the writeback >>> error and will cleanup the journal. Thus we've lost the correct data in the >>> journal area. To solve this issue, mark the buffer as BH_write_io_error in >>> ext4_finish_bio(). >>> >> >> Thinking about this issue in depth, the symlink data block is one kind of >> metadata, but the page mapping of such block is belongs to the ext4 inode, >> it's not coordinate to other metadata blocks, e.g. directory block and extents >> block. This is why we have already fix the same issue of other metadata blocks >> in commit fcf37549ae19e9 "jbd2: ensure abort the journal if detect IO error >> when writing original buffer back" but missing the case of symlink data block. >> So, after Ye Bin's fix, I think it's worth to unify the symlink data block >> mapping to bdev, any suggestions? > > Well, symlink with external block is essentially a case of data=journal > data block. So even if we would handle symlinks, we would still need to > deal with other inodes with journalled data. Also we need to keep the> symlink contents in the page cache to make it simple for generic VFS code > handling symlinks. So I don't see how we could substantially unify > things... > Yeah, this fix is still needed for other regular file's journalled data when we mounted filesystem with data=jouranl mode. But if we just consider whether if we could unify the journal mode of ext4's metadata blocks, it seems that using data=journal mode for symlink's external data block is also complicated and confused in the creating procedure. Instead, if we use ext4_bread(), it make things clear, and it seems also has no side effect of reading symlinks. I write a RFC patch to do this, please take a look at my latest mail "[RFC PATCH] ext4: convert symlink external data block mapping to bdev". Thanks, Yi.