Patch "btrfs: qgroup: fix data leak caused by race between writeback and truncate" has been added to the 4.14-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    btrfs: qgroup: fix data leak caused by race between writeback and truncate

to the 4.14-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     btrfs-qgroup-fix-data-leak-caused-by-race-between-wr.patch
and it can be found in the queue-4.14 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit fa4d196c7607cf6bca44d1d2d85f8bba4891e436
Author: Qu Wenruo <wqu@xxxxxxxx>
Date:   Fri Jul 17 15:12:05 2020 +0800

    btrfs: qgroup: fix data leak caused by race between writeback and truncate
    
    [ Upstream commit fa91e4aa1716004ea8096d5185ec0451e206aea0 ]
    
    [BUG]
    When running tests like generic/013 on test device with btrfs quota
    enabled, it can normally lead to data leak, detected at unmount time:
    
      BTRFS warning (device dm-3): qgroup 0/5 has unreleased space, type 0 rsv 4096
      ------------[ cut here ]------------
      WARNING: CPU: 11 PID: 16386 at fs/btrfs/disk-io.c:4142 close_ctree+0x1dc/0x323 [btrfs]
      RIP: 0010:close_ctree+0x1dc/0x323 [btrfs]
      Call Trace:
       btrfs_put_super+0x15/0x17 [btrfs]
       generic_shutdown_super+0x72/0x110
       kill_anon_super+0x18/0x30
       btrfs_kill_super+0x17/0x30 [btrfs]
       deactivate_locked_super+0x3b/0xa0
       deactivate_super+0x40/0x50
       cleanup_mnt+0x135/0x190
       __cleanup_mnt+0x12/0x20
       task_work_run+0x64/0xb0
       __prepare_exit_to_usermode+0x1bc/0x1c0
       __syscall_return_slowpath+0x47/0x230
       do_syscall_64+0x64/0xb0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      ---[ end trace caf08beafeca2392 ]---
      BTRFS error (device dm-3): qgroup reserved space leaked
    
    [CAUSE]
    In the offending case, the offending operations are:
    2/6: writev f2X[269 1 0 0 0 0] [1006997,67,288] 0
    2/7: truncate f2X[269 1 0 0 48 1026293] 18388 0
    
    The following sequence of events could happen after the writev():
            CPU1 (writeback)                |               CPU2 (truncate)
    -----------------------------------------------------------------
    btrfs_writepages()                      |
    |- extent_write_cache_pages()           |
       |- Got page for 1003520              |
       |  1003520 is Dirty, no writeback    |
       |  So (!clear_page_dirty_for_io())   |
       |  gets called for it                |
       |- Now page 1003520 is Clean.        |
       |                                    | btrfs_setattr()
       |                                    | |- btrfs_setsize()
       |                                    |    |- truncate_setsize()
       |                                    |       New i_size is 18388
       |- __extent_writepage()              |
       |  |- page_offset() > i_size         |
          |- btrfs_invalidatepage()         |
             |- Page is clean, so no qgroup |
                callback executed
    
    This means, the qgroup reserved data space is not properly released in
    btrfs_invalidatepage() as the page is Clean.
    
    [FIX]
    Instead of checking the dirty bit of a page, call
    btrfs_qgroup_free_data() unconditionally in btrfs_invalidatepage().
    
    As qgroup rsv are completely bound to the QGROUP_RESERVED bit of
    io_tree, not bound to page status, thus we won't cause double freeing
    anyway.
    
    Fixes: 0b34c261e235 ("btrfs: qgroup: Prevent qgroup->reserved from going subzero")
    CC: stable@xxxxxxxxxxxxxxx # 4.14+
    Reviewed-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
    Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
    Signed-off-by: David Sterba <dsterba@xxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 17856e92b93d1..c9e7b92d0f212 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9204,20 +9204,17 @@ again:
 	/*
 	 * Qgroup reserved space handler
 	 * Page here will be either
-	 * 1) Already written to disk
-	 *    In this case, its reserved space is released from data rsv map
-	 *    and will be freed by delayed_ref handler finally.
-	 *    So even we call qgroup_free_data(), it won't decrease reserved
-	 *    space.
-	 * 2) Not written to disk
-	 *    This means the reserved space should be freed here. However,
-	 *    if a truncate invalidates the page (by clearing PageDirty)
-	 *    and the page is accounted for while allocating extent
-	 *    in btrfs_check_data_free_space() we let delayed_ref to
-	 *    free the entire extent.
+	 * 1) Already written to disk or ordered extent already submitted
+	 *    Then its QGROUP_RESERVED bit in io_tree is already cleaned.
+	 *    Qgroup will be handled by its qgroup_record then.
+	 *    btrfs_qgroup_free_data() call will do nothing here.
+	 *
+	 * 2) Not written to disk yet
+	 *    Then btrfs_qgroup_free_data() call will clear the QGROUP_RESERVED
+	 *    bit of its io_tree, and free the qgroup reserved data space.
+	 *    Since the IO will never happen for this page.
 	 */
-	if (PageDirty(page))
-		btrfs_qgroup_free_data(inode, NULL, page_start, PAGE_SIZE);
+	btrfs_qgroup_free_data(inode, NULL, page_start, PAGE_SIZE);
 	if (!inode_evicting) {
 		clear_extent_bit(tree, page_start, page_end,
 				 EXTENT_LOCKED | EXTENT_DIRTY |



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux