Hi: On Mon, Jul 09, 2018 at 02:16:33PM -0700, Andrew Morton wrote: > On Mon, 9 Jul 2018 17:25:52 -0300 Ernesto A. Fernández <ernesto.mnd.fernandez@xxxxxxxxx> wrote: > > > After an extent is removed from the extent tree, the corresponding bits > > are also cleared from the block allocation file. This is currently done > > without releasing the tree lock. > > > > The problem is that the allocation file has extents of its own; if it is > > fragmented enough, some of them may be in the extent tree as well, and > > hfsplus_get_block() will try to take the lock again. > > > > To avoid deadlock, only hold the extent tree lock during the actual tree > > operations. > > There's now a whole bunch of code which no longer has ->tree_lock > coverage. Are you sure this doesn't introduce races? > > Do we know what specifically the tree_lock is protecting? That doesn't > just mean what is it *supposed* to protect - it might be protecting > other things by accident... The tree_lock is only taken in two other places: hfsplus_ext_read_extent() and hfsplus_ext_write_extent_locked(). This is always for the purpose of working with the extent tree, and hip->extents_lock is taken beforehand to protect the extents kept in the inode. hfsplus_free_extents() doesn't work with the extents in the tree; only with local copies (in hfsplus_free_fork()), or with the cached copies and in-inode extents (in hfsplus_file_truncate()). Those are protected by hip->extents_lock, and are marked as dirty if needed so they can later be copied to the extent tree, under the tree_lock. So hfsplus_free_extents() doesn't do anything that could cause a race until the call to hfsplus_block_free(), which quickly takes the alloc mutex (and will take the extent tree_lock again if it's needed, as well as the extents_lock of the alloc inode itself). There is one problem though. If the extent file itself is full and needs new blocks, a call to hfsplus_ext_write_extent_locked() will at some point call hfsplus_block_allocate() and try to take the alloc mutex while holding the tree_lock. This could deadlock with hfsplus_block_free() because of the inverted order of the locks (only if the allocation file is fragmented enough to have extents in the tree). I mostly ignored situations where the extent file needs to grow and the allocation file has extents in the tree, because even without my patch those will always deadlock at hfsplus_file_extend() when hfsplus_block_allocate() tries to take the extent tree_lock again. And the next call to hfsplus_ext_write_extent_locked() would also deadlock, if it was ever reached. The bug here is that hfsplus_file_extend() was clearly never supposed to be called while holding ->tree_lock, since it takes hip->extents_lock. Even more potential for deadlock there. I think the solution to this whole mess is to drop the lock of the extent tree whenever possible, and keep the locking order as: hip->extents_lock alloc_mutex alloc_hip->extents_lock ->tree_lock If you prefer I can try to solve the whole thing in a single patch, but I first need to understand better what happens when the extent file gains new in-tree extents itself. Of course if you can think of a simpler solution it would be great. I am overwhelmed by this. > > > > --- a/fs/hfsplus/extents.c > > +++ b/fs/hfsplus/extents.c > > > > ... > > > > @@ -576,15 +581,20 @@ void hfsplus_file_truncate(struct inode *inode) > > } > > while (1) { > > if (alloc_cnt == hip->first_blocks) { > > + mutex_unlock(&fd.tree->tree_lock); > > hfsplus_free_extents(sb, hip->first_extents, > > alloc_cnt, alloc_cnt - blk_cnt); > > hfsplus_dump_extent(hip->first_extents); > > hip->first_blocks = blk_cnt; > > + mutex_lock(&fd.tree->tree_lock); > > break; > > } > > offtopic: hfsplus_free_extents() does hfsplus_dump_extent(), so I > wonder whether this hfsplus_dump_extent() call is duplicative. hfsplus_free_extents() does make modifications to the extents outside the tree, so one may want to check what they look like after the call. I hope I was clear. Thanks, Ernest > > > > > ... > >