Re: Patch "btrfs: qgroup: fix data leak caused by race between writeback and truncate" has been added to the 4.14-stable tree

Qu Wenruo <wqu@xxxxxxxx> · Sun, 26 Jul 2020 20:54:29 +0800

On 2020/7/26 下午8:51, Qu Wenruo wrote:
> 
> 
> On 2020/7/26 下午8:41, gregkh@xxxxxxxxxxxxxxxxxxx wrote:
>>
>> This is a note to let you know that I've just added the patch titled
>>
>>     btrfs: qgroup: fix data leak caused by race between writeback and truncate
>>
>> to the 4.14-stable tree which can be found at:
>>     http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
>>
>> The filename of the patch is:
>>      btrfs-qgroup-fix-data-leak-caused-by-race-between-writeback-and-truncate.patch
>> and it can be found in the queue-4.14 subdirectory.
>>
>> If you, or anyone else, feels it should not be added to the stable tree,
>> please let <stable@xxxxxxxxxxxxxxx> know about it.
> 
> Please don't merge this patch for any of the stable branches.
> 
> This patch needs one unmerged patch ("btrfs: change timing for qgroup
> reserved space for ordered extents to fix reserved space leak", already
> in maintainer's tree) as prerequisite.

Also add btrfs mail list to the discssusion.

> 
> The behavior without that patch could be problematic.
> 
> I should have noticed this earlier.
> 
> Thanks,
> Qu
> 
>>
>>
>> From fa91e4aa1716004ea8096d5185ec0451e206aea0 Mon Sep 17 00:00:00 2001
>> From: Qu Wenruo <wqu@xxxxxxxx>
>> Date: Fri, 17 Jul 2020 15:12:05 +0800
>> Subject: btrfs: qgroup: fix data leak caused by race between writeback and truncate
>>
>> From: Qu Wenruo <wqu@xxxxxxxx>
>>
>> commit fa91e4aa1716004ea8096d5185ec0451e206aea0 upstream.
>>
>> [BUG]
>> When running tests like generic/013 on test device with btrfs quota
>> enabled, it can normally lead to data leak, detected at unmount time:
>>
>>   BTRFS warning (device dm-3): qgroup 0/5 has unreleased space, type 0 rsv 4096
>>   ------------[ cut here ]------------
>>   WARNING: CPU: 11 PID: 16386 at fs/btrfs/disk-io.c:4142 close_ctree+0x1dc/0x323 [btrfs]
>>   RIP: 0010:close_ctree+0x1dc/0x323 [btrfs]
>>   Call Trace:
>>    btrfs_put_super+0x15/0x17 [btrfs]
>>    generic_shutdown_super+0x72/0x110
>>    kill_anon_super+0x18/0x30
>>    btrfs_kill_super+0x17/0x30 [btrfs]
>>    deactivate_locked_super+0x3b/0xa0
>>    deactivate_super+0x40/0x50
>>    cleanup_mnt+0x135/0x190
>>    __cleanup_mnt+0x12/0x20
>>    task_work_run+0x64/0xb0
>>    __prepare_exit_to_usermode+0x1bc/0x1c0
>>    __syscall_return_slowpath+0x47/0x230
>>    do_syscall_64+0x64/0xb0
>>    entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>   ---[ end trace caf08beafeca2392 ]---
>>   BTRFS error (device dm-3): qgroup reserved space leaked
>>
>> [CAUSE]
>> In the offending case, the offending operations are:
>> 2/6: writev f2X[269 1 0 0 0 0] [1006997,67,288] 0
>> 2/7: truncate f2X[269 1 0 0 48 1026293] 18388 0
>>
>> The following sequence of events could happen after the writev():
>> 	CPU1 (writeback)		|		CPU2 (truncate)
>> -----------------------------------------------------------------
>> btrfs_writepages()			|
>> |- extent_write_cache_pages()		|
>>    |- Got page for 1003520		|
>>    |  1003520 is Dirty, no writeback	|
>>    |  So (!clear_page_dirty_for_io())   |
>>    |  gets called for it		|
>>    |- Now page 1003520 is Clean.	|
>>    |					| btrfs_setattr()
>>    |					| |- btrfs_setsize()
>>    |					|    |- truncate_setsize()
>>    |					|       New i_size is 18388
>>    |- __extent_writepage()		|
>>    |  |- page_offset() > i_size		|
>>       |- btrfs_invalidatepage()		|
>> 	 |- Page is clean, so no qgroup |
>> 	    callback executed
>>
>> This means, the qgroup reserved data space is not properly released in
>> btrfs_invalidatepage() as the page is Clean.
>>
>> [FIX]
>> Instead of checking the dirty bit of a page, call
>> btrfs_qgroup_free_data() unconditionally in btrfs_invalidatepage().
>>
>> As qgroup rsv are completely bound to the QGROUP_RESERVED bit of
>> io_tree, not bound to page status, thus we won't cause double freeing
>> anyway.
>>
>> Fixes: 0b34c261e235 ("btrfs: qgroup: Prevent qgroup->reserved from going subzero")
>> CC: stable@xxxxxxxxxxxxxxx # 4.14+
>> Reviewed-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
>> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
>> Signed-off-by: David Sterba <dsterba@xxxxxxxx>
>> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
>>
>> ---
>>  fs/btrfs/inode.c |   23 ++++++++++-------------
>>  1 file changed, 10 insertions(+), 13 deletions(-)
>>
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -9197,20 +9197,17 @@ again:
>>  	/*
>>  	 * Qgroup reserved space handler
>>  	 * Page here will be either
>> -	 * 1) Already written to disk
>> -	 *    In this case, its reserved space is released from data rsv map
>> -	 *    and will be freed by delayed_ref handler finally.
>> -	 *    So even we call qgroup_free_data(), it won't decrease reserved
>> -	 *    space.
>> -	 * 2) Not written to disk
>> -	 *    This means the reserved space should be freed here. However,
>> -	 *    if a truncate invalidates the page (by clearing PageDirty)
>> -	 *    and the page is accounted for while allocating extent
>> -	 *    in btrfs_check_data_free_space() we let delayed_ref to
>> -	 *    free the entire extent.
>> +	 * 1) Already written to disk or ordered extent already submitted
>> +	 *    Then its QGROUP_RESERVED bit in io_tree is already cleaned.
>> +	 *    Qgroup will be handled by its qgroup_record then.
>> +	 *    btrfs_qgroup_free_data() call will do nothing here.
>> +	 *
>> +	 * 2) Not written to disk yet
>> +	 *    Then btrfs_qgroup_free_data() call will clear the QGROUP_RESERVED
>> +	 *    bit of its io_tree, and free the qgroup reserved data space.
>> +	 *    Since the IO will never happen for this page.
>>  	 */
>> -	if (PageDirty(page))
>> -		btrfs_qgroup_free_data(inode, NULL, page_start, PAGE_SIZE);
>> +	btrfs_qgroup_free_data(inode, NULL, page_start, PAGE_SIZE);
>>  	if (!inode_evicting) {
>>  		clear_extent_bit(tree, page_start, page_end,
>>  				 EXTENT_LOCKED | EXTENT_DIRTY |
>>
>>
>> Patches currently in stable-queue which might be from wqu@xxxxxxxx are
>>
>> queue-4.14/btrfs-qgroup-fix-data-leak-caused-by-race-between-writeback-and-truncate.patch
>>