On Mon, May 02, 2016 at 01:15:13PM +0200, Jerome Glisse wrote: > On Mon, May 02, 2016 at 01:41:19PM +0300, Kirill A. Shutemov wrote: > > Other thing I would like to discuss is if there's a problem on vfio side. > > To me it looks like vfio expects guarantee from get_user_pages() which it > > doesn't provide: obtaining pin on the page doesn't guarantee that the page > > is going to remain mapped into userspace until the pin is gone. > > > > Even with THP COW regressing fixed, vfio would stay fragile: any > > MADV_DONTNEED/fork()/mremap()/whatever what would make vfio expectation > > broken. > > > > Well i don't think it is fair/accurate assessment of get_user_pages(), page > must remain mapped to same virtual address until pin is gone. I am ignoring > mremap() as it is a scient decision from userspace and while virtual address > change in that case, the pined page behind should move with the mapping. > Same of MADV_DONTNEED. I agree that get_user_pages() is broken after fork() > but this have been the case since dawn of time, so it is something expected. > > If not vfio, then direct-io, have been expecting this kind of behavior for > long time, so i see this as part of get_user_pages() guarantee. > > Concerning vfio, not providing this guarantee will break countless number of > workload. Thing like qemu/kvm allocate anonymous memory and hand it over to > the guest kernel which presents it as memory. Now a device driver inside the > guest kernel need to get bus mapping for a given (guest) page, which from > host point of view means a mapping from anonymous page to bus mapping but > for guest to keep accessing the same page the anonymous mapping (ie a > specific virtual address on the host side) must keep pointing to the same > page. This have been the case with get_user_pages() until now, so whether > we like it or not we must keep that guarantee. > > This kind of workload knows that they can't do mremap()/fork()/... and keep > that guarantee but they at expect existing guarantee and i don't think we > can break that. Quick look around: - I don't see any check page_count() around __replace_page() in uprobes, so it can easily replace pinned page. - KSM has the page_count() check, there's still race wrt GUP_fast: it can take the pin between the check and establishing new pte entry. - khugepaged: the same story as with KSM. I don't see how we can deliver on the guarantee, especially with lockless GUP_fast. Or am I missing something important? -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>