Re: [PATCH][v2] btrfs: run delayed items before dropping the snapshot

Filipe Manana <fdmanana@xxxxxxxxx> · Wed, 5 Dec 2018 20:13:20 +0000

On Wed, Dec 5, 2018 at 5:14 PM Josef Bacik <josef@xxxxxxxxxxxxxx> wrote:
>
> From: Josef Bacik <jbacik@xxxxxx>
>
> With my delayed refs patches in place we started seeing a large amount
> of aborts in __btrfs_free_extent
>
> BTRFS error (device sdb1): unable to find ref byte nr 91947008 parent 0 root 35964  owner 1 offset 0
> Call Trace:
>  ? btrfs_merge_delayed_refs+0xaf/0x340
>  __btrfs_run_delayed_refs+0x6ea/0xfc0
>  ? btrfs_set_path_blocking+0x31/0x60
>  btrfs_run_delayed_refs+0xeb/0x180
>  btrfs_commit_transaction+0x179/0x7f0
>  ? btrfs_check_space_for_delayed_refs+0x30/0x50
>  ? should_end_transaction.isra.19+0xe/0x40
>  btrfs_drop_snapshot+0x41c/0x7c0
>  btrfs_clean_one_deleted_snapshot+0xb5/0xd0
>  cleaner_kthread+0xf6/0x120
>  kthread+0xf8/0x130
>  ? btree_invalidatepage+0x90/0x90
>  ? kthread_bind+0x10/0x10
>  ret_from_fork+0x35/0x40
>
> This was because btrfs_drop_snapshot depends on the root not being modified
> while it's dropping the snapshot.  It will unlock the root node (and really
> every node) as it walks down the tree, only to re-lock it when it needs to do
> something.  This is a problem because if we modify the tree we could cow a block
> in our path, which free's our reference to that block.  Then once we get back to
> that shared block we'll free our reference to it again, and get ENOENT when
> trying to lookup our extent reference to that block in __btrfs_free_extent.
>
> This is ultimately happening because we have delayed items left to be processed
> for our deleted snapshot _after_ all of the inodes are closed for the snapshot.
> We only run the delayed inode item if we're deleting the inode, and even then we
> do not run the delayed insertions or delayed removals.  These can be run at any
> point after our final inode does it's last iput, which is what triggers the
> snapshot deletion.  We can end up with the snapshot deletion happening and then
> have the delayed items run on that file system, resulting in the above problem.
>
> This problem has existed forever, however my patches made it much easier to hit
> as I wake up the cleaner much more often to deal with delayed iputs, which made
> us more likely to start the snapshot dropping work before the transaction
> commits, which is when the delayed items would generally be run.  Before,
> generally speaking, we would run the delayed items, commit the transaction, and
> wakeup the cleaner thread to start deleting snapshots, which means we were less
> likely to hit this problem.  You could still hit it if you had multiple
> snapshots to be deleted and ended up with lots of delayed items, but it was
> definitely harder.
>
> Fix for now by simply running all the delayed items before starting to drop the
> snapshot.  We could make this smarter in the future by making the delayed items
> per-root, and then simply drop any delayed items for roots that we are going to
> delete.  But for now just a quick and easy solution is the safest.
>
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx>

Reviewed-by: Filipe Manana <fdmanana@xxxxxxxx>

Looks good now. Thanks.

> ---
> v1->v2:
> - check for errors from btrfs_run_delayed_items.
> - Dave I can reroll the series, but the second version of patch 1 is the same,
>   let me know what you want.
>
>  fs/btrfs/extent-tree.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index dcb699dd57f3..473084eb7a2d 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -9330,6 +9330,10 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>                 goto out_free;
>         }
>
> +       err = btrfs_run_delayed_items(trans);
> +       if (err)
> +               goto out_end_trans;
> +
>         if (block_rsv)
>                 trans->block_rsv = block_rsv;
>
> --
> 2.14.3
>

-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”