Re: xfstests 344 deadlock on NOVA

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 27, 2018 at 1:41 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Tue, Feb 27, 2018 at 11:15:55AM -0800, Andiry Xu wrote:
>> Hi,
>>
>> I encounter a problem when running xfstests on NOVA. I appreciate your
>> help very much.
>>
>> Some background: NOVA adopts a per-inode logging design. Metadata
>> changes are appended to the log and persisted before returning to the
>> user space. For example, a write() in NOVA works like this:
>>
>> Allocate new pmem blocks and fill with user data
>> Append the metadata that describes this write to the end of the inode log
>> Update the log tail pointer atomically to commit the write
>> Update in-DRAM radix tree to point to the metadata (for fast lookup)
>>
>> The log appending and radix tree update are protected by inode_lock().
>>
>> For mmap, nova_dax_get_blocks (similar to ext2_get_blocks)  needs to
>> persist the metadata for new allocations. So it has to append the new
>> allocation metadata to the log, and hence needs to lock the inode.
>> This causes deadlock in xfstests 344 with concurrent pwrite and mmap
>> write:
>>
>> Thread 1:
>> pwrite
>> -> nova_dax_file_write()
>> ---> inode_lock()
>> -----> invalidate_inode_pages2_range()
>> -------> schedule()
>
> Why did this thread schedule here?
>
>> Thread 2:
>> dax_fault
>> -> nova_dax_get_blocks()
>> ---> inode_lock() // deadlock
>
> It's just waiting on an inode_lock() to be released by another
> thread. What resource is it holding locked that the first thread
> needs to make progress?
>
>> If I remove invalidate_inode_pages2_range() in write path, xfstests
>> 344 passed, but 428 will fail.
>>
>> It appears to me that ext2/ext4 does not have this issue because they
>> don't persist metadata immediately and hence do not take inode lock.
>
> Did you realise that you can't take the inode->i_rwsem in the page
> fault path (i.e. under the mmap_sem) because that's a known deadlock
> vector against the read/write IO path?
>
> (i.e. you can use a mmap'd buffer over a range of the same file as
> the data buffer for the IO, then take a page fault when trying to
> copy data into/out of that buffer while holding the inode->i_rwsem)
>
>> But nova_dax_get_blocks() has to persist the metadata and needs to
>> lock the inode to access the log. Is there a way to workaround this?
>> Thank you very much.
>
> I'm pretty sure you don't want to use inode->i_rwsem for internal
> metadata serialisation requirements. XFS uses xfs_inode->i_ilock for
> this, ext4 uses a combination of other locks, etc, and it's done to
> separate internal serialisation requirements from user data access
> and VFS serialisation requirements...
>

Thank you. I have fixed the issue by using a different semaphore.

Thanks,
Andiry

> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux