Re: [PATCH v2 1/3] btrfs: avoid double clean up when submit_one_bio() failed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 12, 2022 at 08:30:13PM +0800, Qu Wenruo wrote:
> [BUG]
> When running generic/475 with 64K page size and 4K sector size, it has a
> very high chance (almost 100%) to hang, with mostly data page locked but
> no one is going to unlock it.
> 
> [CAUSE]
> With commit 1784b7d502a9 ("btrfs: handle csum lookup errors properly on
> reads"), if we failed to lookup checksum due to metadata IO error, we
> will return error for btrfs_submit_data_bio().
> 
> This will cause the page to be unlocked twice in btrfs_do_readpage():
> 
>  btrfs_do_readpage()
>  |- submit_extent_page()
>  |  |- submit_one_bio()
>  |     |- btrfs_submit_data_bio()
>  |        |- if (ret) {
>  |        |-     bio->bi_status = ret;
>  |        |-     bio_endio(bio); }
>  |               In the endio function, we will call end_page_read()
>  |               and unlock_extent() to cleanup the subpage range.
>  |
>  |- if (ret) {
>  |-        unlock_extent(); end_page_read() }
>            Here we unlock the extent and cleanup the subpage range
>            again.
> 
> For unlock_extent(), it's mostly double unlock safe.
> 
> But for end_page_read(), it's not, especially for subpage case,
> as for subpage case we will call btrfs_subpage_end_reader() to reduce
> the reader number, and use that to number to determine if we need to
> unlock the full page.
> 
> If double accounted, it can underflow the number and leave the page
> locked without anyone to unlock it.
> 
> [FIX]
> The commit 1784b7d502a9 ("btrfs: handle csum lookup errors properly on
> reads") itself is completely fine, it's our existing code not properly
> handling the error from bio submission hook properly.
> 
> This patch will make submit_one_bio() to return void so that the callers
> will never be able to do cleanup when bio submission hook fails.
> 
> CC: stable@xxxxxxxxxxxxxxx # 5.18+

BTW stable tags are only for released kernels, if it's still in some rc
then Fixes: is appropriate.



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux