On Wednesday, 19 May 2021 3:27:42 AM AEST Peter Xu wrote: > > > The odd part is the remote GUP should have walked the page table > > > already, so since the target here is the vaddr to replace, the 1st page > > > table walk should be able to both trylock/lock the page, then modify > > > the pte with pgtable lock held, return the locked page, then walk the > > > rmap again to remove all the rest of the ptes that are mapping to this > > > page. In that case before we call the rmap_walk() we know this must be > > > the page we want to take care of, and no one will be able to restore > > > the original mm pte either (as we're with the page lock). Then we > > > don't need this check, neither do we need ttp->address. > > > > If I am understanding you correctly I think this would be similar to the > > approach that was taken in v2. However it pretty much ended up being just > > an open-coded version of gup which is useful anyway to fault the page in. > I see. For easier reference this is v2 patch 1: > > https://lore.kernel.org/lkml/20210219020750.16444-2-apopple@xxxxxxxxxx/ Sorry, I should have been clearer and just included that reference for you. > Indeed that looks like it, it's just that instead of grabbing the page only > in hmm_exclusive_pmd() we can do the pte modification along the way to seal > the whole thing (address/pte & page). I saw Christoph and Jason commented > in that patch, but not regarding to this approach. So is there a reason > that you switched? Do you think it'll work? I think the approach you are describing is similar to what migrate_vma_collect()/migrate_vma_unamp() does now and I think it could be made to work. I ended up going with the GUP+unmap approach in part because Christoph suggested it but primarily because it required less code especially given we needed to call something to fault the page in/break COW anyway (or extend what was there to call handle_mm_fault(), etc.). > I have no strong opinion either, it's just not crystal clear why we'd need > that ttp->address at all for a rmap walk along with that "valid" field. It's purely to ensure the PTE pointing to the GUP page was replaced with an exclusive swap entry and that the mapping didn't change between calls. > Meanwhile it should be slightly less efficient too to go with current > approach, especially when the page array gets huge, I think: since there'll > be longer time we do GUP before doing the rmap walk, so higher possibility > that the GUPed pages got replaced for whatever reason. Then the call to > make_device_exclusive_range() will fail as a whole just for a single page > replacement within the range.