Re: [PATCH v4 17/18] mm, fs, dax: dax_flush_dma, handle dma vs block-map-change collisions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 8, 2018 at 5:50 AM, Jan Kara <jack@xxxxxxx> wrote:
> On Sun 07-01-18 13:58:42, Dan Williams wrote:
>> On Thu, Jan 4, 2018 at 3:12 AM, Jan Kara <jack@xxxxxxx> wrote:
>> > On Sat 23-12-17 16:57:31, Dan Williams wrote:
>> >
>> >> +     /*
>> >> +      * Flush dax_dma_lock() sections to ensure all possible page
>> >> +      * references have been taken, or will block on the fs
>> >> +      * 'mmap_lock'.
>> >> +      */
>> >> +     synchronize_rcu();
>> >
>> > Frankly, I don't like synchronize_rcu() in a relatively hot path like this.
>> > Cannot we just abuse get_dev_pagemap() to fail if truncation is in progress
>> > for the pfn? We could indicate that by some bit in struct page or something
>> > like that.
>>
>> We would need a lockless way to take a reference conditionally if the
>> page is not subject to truncation.
>>
>> I recall the raid5 code did something similar where it split a
>> reference count into 2 fields. I.e. take page->_refcount and use the
>> upper bits as a truncation count. Something like:
>>
>> do {
>>     old = atomic_read(&page->_refcount);
>>     if (old & trunc_mask) /* upper bits of _refcount */
>>         return false;
>>     new = cnt + 1;
>> } while (atomic_cmpxchg(&page->_refcount, old, new) != old);
>> return true; /* we incremented the _refcount while the truncation
>> count was zero */
>>
>> ...the only concern is teaching the put_page() path to consider that
>> 'trunc_mask' when determining that the page is idle.
>>
>> Other ideas?
>
> What I rather thought about was an update to GUP paths (like
> follow_page_pte()):
>
>         if (flags & FOLL_GET) {
>                 get_page(page);
>                 if (pte_devmap(pte)) {
>                         /*
>                          * Pairs with the barrier in the truncate path.
>                          * Could be possibly _after_atomic version of the
>                          * barrier.
>                          */
>                         smp_mb();
>                         if (PageTruncateInProgress(page)) {
>                                 put_page(page);
>                                 ..bail...
>                         }
>                 }
>         }
>
> and in the truncate path:
>
>         down_write(inode->i_mmap_sem);
>         walk all pages in the mapping and mark them PageTruncateInProgress().
>         unmap_mapping_range(...);
>         /*
>          * Pairs with the barrier in GUP path. In fact not necessary since
>          * unmap_mapping_range() provides us with the barrier already.
>          */
>         smp_mb();
>         /*
>          * By now we are either guaranteed to see grabbed page reference or
>          * GUP is guaranteed to see PageTruncateInProgress().
>          */
>         while ((page = dax_find_referenced_page(mapping))) {
>                 ...
>         }
>
> The barriers need some verification, I've opted for the conservative option
> but I guess you get the idea.

[ Reviving this thread for the next rev of this patch set for 4.17
consideration ]

I don't think this barrier scheme can work in the presence of
get_user_pages_fast(). The get_user_pages_fast() path can race
unmap_mapping_range() to take out an elevated reference count on a
page.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux