Re: xfstests 344 deadlock on NOVA

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 28 Feb 2018 08:41:44 +1100

On Tue, Feb 27, 2018 at 11:15:55AM -0800, Andiry Xu wrote:
> Hi,
> 
> I encounter a problem when running xfstests on NOVA. I appreciate your
> help very much.
> 
> Some background: NOVA adopts a per-inode logging design. Metadata
> changes are appended to the log and persisted before returning to the
> user space. For example, a write() in NOVA works like this:
> 
> Allocate new pmem blocks and fill with user data
> Append the metadata that describes this write to the end of the inode log
> Update the log tail pointer atomically to commit the write
> Update in-DRAM radix tree to point to the metadata (for fast lookup)
> 
> The log appending and radix tree update are protected by inode_lock().
> 
> For mmap, nova_dax_get_blocks (similar to ext2_get_blocks)  needs to
> persist the metadata for new allocations. So it has to append the new
> allocation metadata to the log, and hence needs to lock the inode.
> This causes deadlock in xfstests 344 with concurrent pwrite and mmap
> write:
> 
> Thread 1:
> pwrite
> -> nova_dax_file_write()
> ---> inode_lock()
> -----> invalidate_inode_pages2_range()
> -------> schedule()

Why did this thread schedule here?

> Thread 2:
> dax_fault
> -> nova_dax_get_blocks()
> ---> inode_lock() // deadlock

It's just waiting on an inode_lock() to be released by another
thread. What resource is it holding locked that the first thread
needs to make progress?

> If I remove invalidate_inode_pages2_range() in write path, xfstests
> 344 passed, but 428 will fail.
> 
> It appears to me that ext2/ext4 does not have this issue because they
> don't persist metadata immediately and hence do not take inode lock.

Did you realise that you can't take the inode->i_rwsem in the page
fault path (i.e. under the mmap_sem) because that's a known deadlock
vector against the read/write IO path?

(i.e. you can use a mmap'd buffer over a range of the same file as
the data buffer for the IO, then take a page fault when trying to
copy data into/out of that buffer while holding the inode->i_rwsem)

> But nova_dax_get_blocks() has to persist the metadata and needs to
> lock the inode to access the log. Is there a way to workaround this?
> Thank you very much.

I'm pretty sure you don't want to use inode->i_rwsem for internal
metadata serialisation requirements. XFS uses xfs_inode->i_ilock for
this, ext4 uses a combination of other locks, etc, and it's done to
separate internal serialisation requirements from user data access
and VFS serialisation requirements...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx