On Fri, May 18, 2018 at 07:33:41PM -0700, John Hubbard wrote: > On 05/18/2018 01:23 PM, Dan Williams wrote: > > On Fri, May 18, 2018 at 10:36 AM, Jason Gunthorpe <jgg@xxxxxxxx> wrote: > >> On Fri, May 18, 2018 at 04:47:48PM +0000, Christopher Lameter wrote: > >>> On Fri, 18 May 2018, Jason Gunthorpe wrote: > >>> > >>> > >>> The newcomer here is RDMA. The FS side is the mainstream use case and has > >>> been there since Unix learned to do paging. > >> > >> Well, it has been this way for 12 years, so it isn't that new. > >> > >> Honestly it sounds like get_user_pages is just a broken Linux > >> API?? > >> > >> Nothing can use it to write to pages because the FS could explode - > >> RDMA makes it particularly easy to trigger this due to the longer time > >> windows, but presumably any get_user_pages could generate a race and > >> hit this? Is that right? > > +1, and I am now super-interested in this conversation, because > after tracking down a kernel BUG to this classic mistaken pattern: > > get_user_pages (on file-backed memory from ext4) > ...do some DMA > set_pages_dirty > put_page(s) Ummm, RDMA has done essentially that since 2005, since when did it become wrong? Do you have some references? Is there some alternative? See __ib_umem_release > ...there is (rarely!) a backtrace from ext4, that disavows ownership of > any such pages. Yes, I've seen that oops with RDMA, apparently isn't actually that rare if you tweak things just right. I thought it was an obscure ext4 bug :( > Because the obvious "fix" in device driver land is to use a dedicated > buffer for DMA, and copy to the filesystem buffer, and of course I will > get *killed* if I propose such a performance-killing approach. But a > core kernel fix really is starting to sound attractive. Yeah, killed is right. That idea totally cripples RDMA. What is the point of get_user_pages FOLL_WRITE if you can't write to and dirty the pages!?! Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html