On Thu, May 04, 2023 at 12:34:25AM +0800, Yue Zhao wrote: > Recently we found a bug related with ext4 buffer head is fixed by > commit 0b73284c564d("ext4: ext4_read_bh_lock() should submit IO if the > buffer isn't uptodate")[1]. > > This bug is fixed on some kernel long term versions, such as 5.10 and 5.15. > However, on 5.4 stable version, we can still easily reproduce this bug by > adding some delay after buffer_migrate_lock_buffers() in __buffer_migrate_page() > and do fsstress on the ext4 filesystem. We can get some errors in dmesg like: > > EXT4-fs error (device pmem1): __ext4_find_entry:1658: inode #73193: > comm fsstress: reading directory lblock 0 > EXT4-fs error (device pmem1): __ext4_find_entry:1658: inode #75334: > comm fsstress: reading directory lblock 0 > > About how to fix this bug in 5.4 version, currently I have three ideas. > But I don't know which one is better or is there any other feasible way to > fix this bug elegantly based on the 5.4 stable branch? > > The first idea comes from this thread[2]. In __buffer_migrate_page(), > we can let it fallback to migrate_page that are not uptodate like > fallback_migrate_page(), those pages that has buffers may probably do > read operation soon. From [3], we can see this solution is not good enough > because there are other places that lock the buffer without doing IO. > I think this solution can be a candidate option to fix if we do not want to > change a lot. Also based on my test results, the ext4 filesystem remains > stable after one week stress test with this patch applied. > > The second idea is backport a series of commits from upstream, such as > > 2d069c0889ef ("ext4: use common helpers in all places reading metadata buffers") > 0b73284c564d ("ext4: ext4_read_bh_lock() should submit IO if the buffer isn't uptodate") > 79f597842069 ("fs/buffer: remove ll_rw_block() helper") Backporting the original upstream commits is almost always the correct solution. Please try doing that instead of a one-off patch like this. thanks, greg k-h