Re: [PATCH] hfsplus: avoid deadlock on file truncation

Ernesto A. Fernández <ernesto.mnd.fernandez@xxxxxxxxx> · Mon, 9 Jul 2018 23:29:20 -0300

Hi:

On Mon, Jul 09, 2018 at 02:16:33PM -0700, Andrew Morton wrote:
> On Mon, 9 Jul 2018 17:25:52 -0300 Ernesto A. Fernández <ernesto.mnd.fernandez@xxxxxxxxx> wrote:
> 
> > After an extent is removed from the extent tree, the corresponding bits
> > are also cleared from the block allocation file. This is currently done
> > without releasing the tree lock.
> > 
> > The problem is that the allocation file has extents of its own; if it is
> > fragmented enough, some of them may be in the extent tree as well, and
> > hfsplus_get_block() will try to take the lock again.
> > 
> > To avoid deadlock, only hold the extent tree lock during the actual tree
> > operations.
> 
> There's now a whole bunch of code which no longer has ->tree_lock
> coverage.  Are you sure this doesn't introduce races?
> 
> Do we know what specifically the tree_lock is protecting?  That doesn't
> just mean what is it *supposed* to protect - it might be protecting
> other things by accident...

The tree_lock is only taken in two other places: hfsplus_ext_read_extent()
and hfsplus_ext_write_extent_locked(). This is always for the purpose of
working with the extent tree, and hip->extents_lock is taken beforehand
to protect the extents kept in the inode.

hfsplus_free_extents() doesn't work with the extents in the tree; only
with local copies (in hfsplus_free_fork()), or with the cached copies
and in-inode extents (in hfsplus_file_truncate()). Those are protected
by hip->extents_lock, and are marked as dirty if needed so they can
later be copied to the extent tree, under the tree_lock.

So hfsplus_free_extents() doesn't do anything that could cause a race
until the call to hfsplus_block_free(), which quickly takes the alloc
mutex (and will take the extent tree_lock again if it's needed, as well
as the extents_lock of the alloc inode itself).

There is one problem though. If the extent file itself is full and needs
new blocks, a call to hfsplus_ext_write_extent_locked() will at some point
call hfsplus_block_allocate() and try to take the alloc mutex while holding
the tree_lock. This could deadlock with hfsplus_block_free() because of the
inverted order of the locks (only if the allocation file is fragmented
enough to have extents in the tree).

I mostly ignored situations where the extent file needs to grow and the
allocation file has extents in the tree, because even without my patch
those will always deadlock at hfsplus_file_extend() when
hfsplus_block_allocate() tries to take the extent tree_lock again. And
the next call to hfsplus_ext_write_extent_locked() would also deadlock,
if it was ever reached.

The bug here is that hfsplus_file_extend() was clearly never supposed to
be called while holding ->tree_lock, since it takes hip->extents_lock.
Even more potential for deadlock there.

I think the solution to this whole mess is to drop the lock of the extent
tree whenever possible, and keep the locking order as:

hip->extents_lock
  alloc_mutex
    alloc_hip->extents_lock
      ->tree_lock

If you prefer I can try to solve the whole thing in a single patch, but
I first need to understand better what happens when the extent file gains
new in-tree extents itself.

Of course if you can think of a simpler solution it would be great. I am
overwhelmed by this.

> 
> 
> > --- a/fs/hfsplus/extents.c
> > +++ b/fs/hfsplus/extents.c
> >
> > ...
> >
> > @@ -576,15 +581,20 @@ void hfsplus_file_truncate(struct inode *inode)
> >  	}
> >  	while (1) {
> >  		if (alloc_cnt == hip->first_blocks) {
> > +			mutex_unlock(&fd.tree->tree_lock);
> >  			hfsplus_free_extents(sb, hip->first_extents,
> >  					     alloc_cnt, alloc_cnt - blk_cnt);
> >  			hfsplus_dump_extent(hip->first_extents);
> >  			hip->first_blocks = blk_cnt;
> > +			mutex_lock(&fd.tree->tree_lock);
> >  			break;
> >  		}
> 
> offtopic: hfsplus_free_extents() does hfsplus_dump_extent(), so I
> wonder whether this hfsplus_dump_extent() call is duplicative.

hfsplus_free_extents() does make modifications to the extents outside
the tree, so one may want to check what they look like after the call.

I hope I was clear. Thanks,
Ernest

> 
> >
> > ...
> >