Re: [LSFMM] RDMA data corruption potential during FS writeback

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 18, 2018 at 08:51:38PM -0700, Dan Williams wrote:
> >> +1, and I am now super-interested in this conversation, because
> >> after tracking down a kernel BUG to this classic mistaken pattern:
> >>
> >>     get_user_pages (on file-backed memory from ext4)
> >>     ...do some DMA
> >>     set_pages_dirty
> >>     put_page(s)
> >
> > Ummm, RDMA has done essentially that since 2005, since when did it
> > become wrong? Do you have some references? Is there some alternative?
> >
> > See __ib_umem_release
> >
> >> ...there is (rarely!) a backtrace from ext4, that disavows ownership of
> >> any such pages.
> >
> > Yes, I've seen that oops with RDMA, apparently isn't actually that
> > rare if you tweak things just right.
> >
> > I thought it was an obscure ext4 bug :(
> >
> >> Because the obvious "fix" in device driver land is to use a dedicated
> >> buffer for DMA, and copy to the filesystem buffer, and of course I will
> >> get *killed* if I propose such a performance-killing approach. But a
> >> core kernel fix really is starting to sound attractive.
> >
> > Yeah, killed is right. That idea totally cripples RDMA.
> >
> > What is the point of get_user_pages FOLL_WRITE if you can't write to
> > and dirty the pages!?!
> 
> You're oversimplifying the problem, here are the details:
> 
> https://www.spinics.net/lists/linux-mm/msg142700.html

Suggestion 1:

in get_user_pages_fast(), mark the page as dirty, but don't tag the radix
tree entry as dirty.  Then vmscan() won't find it when it's looking to
write out dirty pages.  Only mark it as dirty in the radix tree once we
call set_page_dirty_lock().

Suggestion 2:

in get_user_pages_fast(), replace the page in the radix tree with a special
entry that means "page under io".  In set_page_dirty_lock(), replace the
"page under io" entry with the struct page pointer.

Both of these suggestions have trouble with simultaneous sub-page IOs to the
same page.  Do we care?  I suspect we might as pages get larger (see also:
supporting THP pages in the page cache).
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux