On Sat, Mar 21, 2020 at 09:22:36AM +0100, Christoph Hellwig wrote: > On Fri, Mar 20, 2020 at 10:41:09AM -0300, Jason Gunthorpe wrote: > > Thinking about this some more, does the locking work out here? > > > > hmm_range_fault() runs with mmap_sem in read, and does not lock any of > > the page table levels. > > > > So it relies on accessing stale pte data being safe, and here we > > introduce for the first time a page pointer dereference and a pgmap > > dereference without any locking/refcounting. > > > > The get_dev_pagemap() worked on the PFN and obtained a refcount, so it > > created safety. > > > > Is there some tricky reason this is safe, eg a DEVICE_PRIVATE page > > cannot be removed from the vma without holding mmap_sem in write or > > something? > > I don't think there is any specific protection. Let me see if we > can throw in a get_dev_pagemap here The page tables are RCU protected right? could we do something like if (is_device_private_entry()) { rcu_read_lock() if (READ_ONCE(*ptep) != pte) return -EBUSY; hmm_is_device_private_entry() rcu_read_unlock() } ? Then pgmap needs a synchronize_rcu before the struct page's are destroyed (possibly gup_fast already requires this?) I've got some other patches trying to close some of these styles of bugs, but > note that current mainline doesn't even use it for this path.. Don't follow? Jason