[BUG] Test case generic/475 have a very high chance (almost 100%) to hit a fs hang, where a data page will never be unlocked and hang all later operations. [CAUSE] In btrfs_do_readpage(), if we hit an error from submit_extent_page() we will try to do the cleanup for our current io range, and exit. This works fine for PAGE_SIZE == sectorsize cases, but not for subpage. For subpage btrfs_do_readpage() will lock the full page first, which can contain several different sectors and extents: btrfs_do_readpage() |- begin_page_read() | |- btrfs_subpage_start_reader(); | Now the page will hage PAGE_SIZE / sectorsize reader pending, | and the page is locked. | |- end_page_read() for different branches | This function will reduce subpage readers, and when readers | reach 0, it will unlock the page. But when submit_extent_page() failed, we only cleanup the current io range, while the remaining io range will never be cleaned up, and the page remains locked forever. [FIX] Update the error handling of submit_extent_page() to cleanup all the remaining subpage range before exiting the loop. CC: stable@xxxxxxxxxxxxxxx # 5.15+ Signed-off-by: Qu Wenruo <wqu@xxxxxxxx> --- fs/btrfs/extent_io.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 34073b0ed6ca..8de25ce05606 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3735,8 +3735,12 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached, this_bio_flag, force_bio_submit); if (ret) { - unlock_extent(tree, cur, cur + iosize - 1); - end_page_read(page, false, cur, iosize); + /* + * We have to unlock the remaining range, or the page + * will never be unlocked. + */ + unlock_extent(tree, cur, end); + end_page_read(page, false, cur, end + 1 - cur); goto out; } cur = cur + iosize; -- 2.35.1