Re: [LSFMM] RDMA data corruption potential during FS writeback

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/18/2018 01:23 PM, Dan Williams wrote:
> On Fri, May 18, 2018 at 10:36 AM, Jason Gunthorpe <jgg@xxxxxxxx> wrote:
>> On Fri, May 18, 2018 at 04:47:48PM +0000, Christopher Lameter wrote:
>>> On Fri, 18 May 2018, Jason Gunthorpe wrote:
>>>
---8<---------------------------------
>>>
>>> The newcomer here is RDMA. The FS side is the mainstream use case and has
>>> been there since Unix learned to do paging.
>>
>> Well, it has been this way for 12 years, so it isn't that new.
>>
>> Honestly it sounds like get_user_pages is just a broken Linux
>> API??
>>
>> Nothing can use it to write to pages because the FS could explode -
>> RDMA makes it particularly easy to trigger this due to the longer time
>> windows, but presumably any get_user_pages could generate a race and
>> hit this? Is that right?

+1, and I am now super-interested in this conversation, because
after tracking down a kernel BUG to this classic mistaken pattern:

    get_user_pages (on file-backed memory from ext4)
    ...do some DMA
    set_pages_dirty
    put_page(s)

...there is (rarely!) a backtrace from ext4, that disavows ownership of
any such pages. It happens rarely enough that people have come to believe
that the pattern is OK, from what I can tell. But some new, cutting edge
systems with zillions of threads and lots of memory are able to expose the
problem.

Anyway, I've been dividing my time between trying to prove exactly 
which FS action is disconnecting the page from ext4 in this particular
bug (even though it's lately becoming well-documented that the design itself
is not correct), and casting about for the most proper place to fix this. 

Because the obvious "fix" in device driver land is to use a dedicated
buffer for DMA, and copy to the filesystem buffer, and of course I will
get *killed* if I propose such a performance-killing approach. But a
core kernel fix really is starting to sound attractive.

>>
>> I am left with the impression that solving it in the FS is too
>> performance costly so FS doesn't want that overheard? Was that also
>> the conclusion?
>>
>> Could we take another crack at this during Linux Plumbers? Will the MM
>> parties be there too? I'm sorry I wasn't able to attend LSFMM this
>> year!
> 
> Yes, you and hch were missed, and I had to skip the last day due to a
> family emergency.
> 
> Plumbers sounds good to resync on this topic, but we already have a
> plan, use "break_layouts()" to coordinate a filesystem's need to move
> dax blocks around relative to an active RDMA memory registration. If
> you never punch a hole in the middle of your RDMA registration then
> you never incur any performance penalty. Otherwise the layout break
> notification is just there to tell the application "hey man, talk to
> your friend that punched a hole in the middle of your mapping, but the
> filesystem wants this block back now. Sorry, I'm kicking you out. Ok,
> bye.".
> 
> In other words, get_user_pages_longterm() is just a short term
> band-aid for RDMA until we can get that infrastructure built. We don't
> need to go down any mmu-notifier rabbit holes.
> 

git grep claims that break_layouts is so far an XFS-only feature, though. 
Were there plans to fix this for all filesystems?


thanks,
-- 
John Hubbard
NVIDIA
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux