On Tue, Sep 22, 2020 at 10:28:02AM -0400, Peter Xu wrote: > On Tue, Sep 22, 2020 at 08:54:36AM -0300, Jason Gunthorpe wrote: > > On Tue, Sep 22, 2020 at 12:47:11AM +0200, Jann Horn wrote: > > > On Tue, Sep 22, 2020 at 12:30 AM Peter Xu <peterx@xxxxxxxxxx> wrote: > > > > On Mon, Sep 21, 2020 at 11:43:38PM +0200, Jann Horn wrote: > > > > > On Mon, Sep 21, 2020 at 11:17 PM Peter Xu <peterx@xxxxxxxxxx> wrote: > > > > > > (Commit message collected from Jason Gunthorpe) > > > > > > > > > > > > Reduce the chance of false positive from page_maybe_dma_pinned() by keeping > > > > > > track if the mm_struct has ever been used with pin_user_pages(). mm_structs > > > > > > that have never been passed to pin_user_pages() cannot have a positive > > > > > > page_maybe_dma_pinned() by definition. > > > > > > > > > > There are some caveats here, right? E.g. this isn't necessarily true > > > > > for pagecache pages, I think? > > > > > > > > Sorry I didn't follow here. Could you help explain with some details? > > > > > > The commit message says "mm_structs that have never been passed to > > > pin_user_pages() cannot have a positive page_maybe_dma_pinned() by > > > definition"; but that is not true for pages which may also be mapped > > > in a second mm and may have been passed to pin_user_pages() through > > > that second mm (meaning they must be writable over there and not > > > shared with us via CoW). > > > > The message does need a few more words to explain this trick can only > > be used with COW'able pages. > > > > > Process A: > > > > > > fd_a = open("/foo/bar", O_RDWR); > > > mapping_a = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_SHARED, fd_a, 0); > > > pin_user_pages(mapping_a, 1, ...); > > > > > > Process B: > > > > > > fd_b = open("/foo/bar", O_RDONLY); > > > mapping_b = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd_b, 0); > > > *(volatile char *)mapping_b; > > > > > > At this point, process B has never called pin_user_pages(), but > > > page_maybe_dma_pinned() on the page at mapping_b would return true. > > > > My expectation is the pin_user_pages() should have already broken the > > COW for the MAP_PRIVATE, so process B should not have a > > page_maybe_dma_pinned() > > When process B maps with PROT_READ only (w/o PROT_WRITE) then it seems the same > page will be mapped. I thought MAP_PRIVATE without PROT_WRITE was nonsensical, it only has meaning for writes initiated by the mapping. MAP_SHARED/PROT_READ is the same behavior on Linux, IIRC. But, yes, you certainly can end up with B having page_maybe_dma_pinned() pages in shared VMA, just not in COW'able mappings. > I think I get the point from Jann now. Maybe it's easier I just remove the > whole "mm_structs that have never been passed to pin_user_pages() cannot have a > positive page_maybe_dma_pinned() by definition" sentence if that's misleading, > because the rest seem to be clear enough on what this new field is used for. "for COW" I think is still the important detail here, see for instance my remark on the PUD/PMD splitting where it is necessary to test for cow before using this. Perhaps we should call it "has_pinned_for_cow" to place emphasis on this detail? Due to the shared pages issue It really doesn't have any broader utility, eg for file back pages or otherwise. Jason