On Tue, Feb 27, 2018 at 1:41 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > On Tue, Feb 27, 2018 at 11:15:55AM -0800, Andiry Xu wrote: >> Hi, >> >> I encounter a problem when running xfstests on NOVA. I appreciate your >> help very much. >> >> Some background: NOVA adopts a per-inode logging design. Metadata >> changes are appended to the log and persisted before returning to the >> user space. For example, a write() in NOVA works like this: >> >> Allocate new pmem blocks and fill with user data >> Append the metadata that describes this write to the end of the inode log >> Update the log tail pointer atomically to commit the write >> Update in-DRAM radix tree to point to the metadata (for fast lookup) >> >> The log appending and radix tree update are protected by inode_lock(). >> >> For mmap, nova_dax_get_blocks (similar to ext2_get_blocks) needs to >> persist the metadata for new allocations. So it has to append the new >> allocation metadata to the log, and hence needs to lock the inode. >> This causes deadlock in xfstests 344 with concurrent pwrite and mmap >> write: >> >> Thread 1: >> pwrite >> -> nova_dax_file_write() >> ---> inode_lock() >> -----> invalidate_inode_pages2_range() >> -------> schedule() > > Why did this thread schedule here? > >> Thread 2: >> dax_fault >> -> nova_dax_get_blocks() >> ---> inode_lock() // deadlock > > It's just waiting on an inode_lock() to be released by another > thread. What resource is it holding locked that the first thread > needs to make progress? > >> If I remove invalidate_inode_pages2_range() in write path, xfstests >> 344 passed, but 428 will fail. >> >> It appears to me that ext2/ext4 does not have this issue because they >> don't persist metadata immediately and hence do not take inode lock. > > Did you realise that you can't take the inode->i_rwsem in the page > fault path (i.e. under the mmap_sem) because that's a known deadlock > vector against the read/write IO path? > > (i.e. you can use a mmap'd buffer over a range of the same file as > the data buffer for the IO, then take a page fault when trying to > copy data into/out of that buffer while holding the inode->i_rwsem) > >> But nova_dax_get_blocks() has to persist the metadata and needs to >> lock the inode to access the log. Is there a way to workaround this? >> Thank you very much. > > I'm pretty sure you don't want to use inode->i_rwsem for internal > metadata serialisation requirements. XFS uses xfs_inode->i_ilock for > this, ext4 uses a combination of other locks, etc, and it's done to > separate internal serialisation requirements from user data access > and VFS serialisation requirements... > Thank you. I have fixed the issue by using a different semaphore. Thanks, Andiry > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx