On Wed, Dec 5, 2018 at 5:14 PM Josef Bacik <josef@xxxxxxxxxxxxxx> wrote: > > From: Josef Bacik <jbacik@xxxxxx> > > With my delayed refs patches in place we started seeing a large amount > of aborts in __btrfs_free_extent > > BTRFS error (device sdb1): unable to find ref byte nr 91947008 parent 0 root 35964 owner 1 offset 0 > Call Trace: > ? btrfs_merge_delayed_refs+0xaf/0x340 > __btrfs_run_delayed_refs+0x6ea/0xfc0 > ? btrfs_set_path_blocking+0x31/0x60 > btrfs_run_delayed_refs+0xeb/0x180 > btrfs_commit_transaction+0x179/0x7f0 > ? btrfs_check_space_for_delayed_refs+0x30/0x50 > ? should_end_transaction.isra.19+0xe/0x40 > btrfs_drop_snapshot+0x41c/0x7c0 > btrfs_clean_one_deleted_snapshot+0xb5/0xd0 > cleaner_kthread+0xf6/0x120 > kthread+0xf8/0x130 > ? btree_invalidatepage+0x90/0x90 > ? kthread_bind+0x10/0x10 > ret_from_fork+0x35/0x40 > > This was because btrfs_drop_snapshot depends on the root not being modified > while it's dropping the snapshot. It will unlock the root node (and really > every node) as it walks down the tree, only to re-lock it when it needs to do > something. This is a problem because if we modify the tree we could cow a block > in our path, which free's our reference to that block. Then once we get back to > that shared block we'll free our reference to it again, and get ENOENT when > trying to lookup our extent reference to that block in __btrfs_free_extent. > > This is ultimately happening because we have delayed items left to be processed > for our deleted snapshot _after_ all of the inodes are closed for the snapshot. > We only run the delayed inode item if we're deleting the inode, and even then we > do not run the delayed insertions or delayed removals. These can be run at any > point after our final inode does it's last iput, which is what triggers the > snapshot deletion. We can end up with the snapshot deletion happening and then > have the delayed items run on that file system, resulting in the above problem. > > This problem has existed forever, however my patches made it much easier to hit > as I wake up the cleaner much more often to deal with delayed iputs, which made > us more likely to start the snapshot dropping work before the transaction > commits, which is when the delayed items would generally be run. Before, > generally speaking, we would run the delayed items, commit the transaction, and > wakeup the cleaner thread to start deleting snapshots, which means we were less > likely to hit this problem. You could still hit it if you had multiple > snapshots to be deleted and ended up with lots of delayed items, but it was > definitely harder. > > Fix for now by simply running all the delayed items before starting to drop the > snapshot. We could make this smarter in the future by making the delayed items > per-root, and then simply drop any delayed items for roots that we are going to > delete. But for now just a quick and easy solution is the safest. > > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx> Reviewed-by: Filipe Manana <fdmanana@xxxxxxxx> Looks good now. Thanks. > --- > v1->v2: > - check for errors from btrfs_run_delayed_items. > - Dave I can reroll the series, but the second version of patch 1 is the same, > let me know what you want. > > fs/btrfs/extent-tree.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index dcb699dd57f3..473084eb7a2d 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -9330,6 +9330,10 @@ int btrfs_drop_snapshot(struct btrfs_root *root, > goto out_free; > } > > + err = btrfs_run_delayed_items(trans); > + if (err) > + goto out_end_trans; > + > if (block_rsv) > trans->block_rsv = block_rsv; > > -- > 2.14.3 > -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.”