On 2020/7/26 下午8:51, Qu Wenruo wrote: > > > On 2020/7/26 下午8:41, gregkh@xxxxxxxxxxxxxxxxxxx wrote: >> >> This is a note to let you know that I've just added the patch titled >> >> btrfs: qgroup: fix data leak caused by race between writeback and truncate >> >> to the 4.14-stable tree which can be found at: >> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary >> >> The filename of the patch is: >> btrfs-qgroup-fix-data-leak-caused-by-race-between-writeback-and-truncate.patch >> and it can be found in the queue-4.14 subdirectory. >> >> If you, or anyone else, feels it should not be added to the stable tree, >> please let <stable@xxxxxxxxxxxxxxx> know about it. > > Please don't merge this patch for any of the stable branches. > > This patch needs one unmerged patch ("btrfs: change timing for qgroup > reserved space for ordered extents to fix reserved space leak", already > in maintainer's tree) as prerequisite. Also add btrfs mail list to the discssusion. > > The behavior without that patch could be problematic. > > I should have noticed this earlier. > > Thanks, > Qu > >> >> >> From fa91e4aa1716004ea8096d5185ec0451e206aea0 Mon Sep 17 00:00:00 2001 >> From: Qu Wenruo <wqu@xxxxxxxx> >> Date: Fri, 17 Jul 2020 15:12:05 +0800 >> Subject: btrfs: qgroup: fix data leak caused by race between writeback and truncate >> >> From: Qu Wenruo <wqu@xxxxxxxx> >> >> commit fa91e4aa1716004ea8096d5185ec0451e206aea0 upstream. >> >> [BUG] >> When running tests like generic/013 on test device with btrfs quota >> enabled, it can normally lead to data leak, detected at unmount time: >> >> BTRFS warning (device dm-3): qgroup 0/5 has unreleased space, type 0 rsv 4096 >> ------------[ cut here ]------------ >> WARNING: CPU: 11 PID: 16386 at fs/btrfs/disk-io.c:4142 close_ctree+0x1dc/0x323 [btrfs] >> RIP: 0010:close_ctree+0x1dc/0x323 [btrfs] >> Call Trace: >> btrfs_put_super+0x15/0x17 [btrfs] >> generic_shutdown_super+0x72/0x110 >> kill_anon_super+0x18/0x30 >> btrfs_kill_super+0x17/0x30 [btrfs] >> deactivate_locked_super+0x3b/0xa0 >> deactivate_super+0x40/0x50 >> cleanup_mnt+0x135/0x190 >> __cleanup_mnt+0x12/0x20 >> task_work_run+0x64/0xb0 >> __prepare_exit_to_usermode+0x1bc/0x1c0 >> __syscall_return_slowpath+0x47/0x230 >> do_syscall_64+0x64/0xb0 >> entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> ---[ end trace caf08beafeca2392 ]--- >> BTRFS error (device dm-3): qgroup reserved space leaked >> >> [CAUSE] >> In the offending case, the offending operations are: >> 2/6: writev f2X[269 1 0 0 0 0] [1006997,67,288] 0 >> 2/7: truncate f2X[269 1 0 0 48 1026293] 18388 0 >> >> The following sequence of events could happen after the writev(): >> CPU1 (writeback) | CPU2 (truncate) >> ----------------------------------------------------------------- >> btrfs_writepages() | >> |- extent_write_cache_pages() | >> |- Got page for 1003520 | >> | 1003520 is Dirty, no writeback | >> | So (!clear_page_dirty_for_io()) | >> | gets called for it | >> |- Now page 1003520 is Clean. | >> | | btrfs_setattr() >> | | |- btrfs_setsize() >> | | |- truncate_setsize() >> | | New i_size is 18388 >> |- __extent_writepage() | >> | |- page_offset() > i_size | >> |- btrfs_invalidatepage() | >> |- Page is clean, so no qgroup | >> callback executed >> >> This means, the qgroup reserved data space is not properly released in >> btrfs_invalidatepage() as the page is Clean. >> >> [FIX] >> Instead of checking the dirty bit of a page, call >> btrfs_qgroup_free_data() unconditionally in btrfs_invalidatepage(). >> >> As qgroup rsv are completely bound to the QGROUP_RESERVED bit of >> io_tree, not bound to page status, thus we won't cause double freeing >> anyway. >> >> Fixes: 0b34c261e235 ("btrfs: qgroup: Prevent qgroup->reserved from going subzero") >> CC: stable@xxxxxxxxxxxxxxx # 4.14+ >> Reviewed-by: Josef Bacik <josef@xxxxxxxxxxxxxx> >> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx> >> Signed-off-by: David Sterba <dsterba@xxxxxxxx> >> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> >> >> --- >> fs/btrfs/inode.c | 23 ++++++++++------------- >> 1 file changed, 10 insertions(+), 13 deletions(-) >> >> --- a/fs/btrfs/inode.c >> +++ b/fs/btrfs/inode.c >> @@ -9197,20 +9197,17 @@ again: >> /* >> * Qgroup reserved space handler >> * Page here will be either >> - * 1) Already written to disk >> - * In this case, its reserved space is released from data rsv map >> - * and will be freed by delayed_ref handler finally. >> - * So even we call qgroup_free_data(), it won't decrease reserved >> - * space. >> - * 2) Not written to disk >> - * This means the reserved space should be freed here. However, >> - * if a truncate invalidates the page (by clearing PageDirty) >> - * and the page is accounted for while allocating extent >> - * in btrfs_check_data_free_space() we let delayed_ref to >> - * free the entire extent. >> + * 1) Already written to disk or ordered extent already submitted >> + * Then its QGROUP_RESERVED bit in io_tree is already cleaned. >> + * Qgroup will be handled by its qgroup_record then. >> + * btrfs_qgroup_free_data() call will do nothing here. >> + * >> + * 2) Not written to disk yet >> + * Then btrfs_qgroup_free_data() call will clear the QGROUP_RESERVED >> + * bit of its io_tree, and free the qgroup reserved data space. >> + * Since the IO will never happen for this page. >> */ >> - if (PageDirty(page)) >> - btrfs_qgroup_free_data(inode, NULL, page_start, PAGE_SIZE); >> + btrfs_qgroup_free_data(inode, NULL, page_start, PAGE_SIZE); >> if (!inode_evicting) { >> clear_extent_bit(tree, page_start, page_end, >> EXTENT_LOCKED | EXTENT_DIRTY | >> >> >> Patches currently in stable-queue which might be from wqu@xxxxxxxx are >> >> queue-4.14/btrfs-qgroup-fix-data-leak-caused-by-race-between-writeback-and-truncate.patch >>