On 10/15/21 00:04, Jason Gunthorpe wrote: > 2) Denying FOLL_LONGTERM > Once GUP has grabbed the page we can call is_zone_device_page() on > the struct page. If true we can check page->pgmap and read some > DENY_FOLL_LONGTERM flag from there > I had proposed something similar to that: https://lore.kernel.org/linux-mm/6a18179e-65f7-367d-89a9-d5162f10fef0@xxxxxxxxxx/ Albeit I was using pgmap->type and was relying on get_dev_pagemap() ref as opposed to after grabbing the page. I can ressurect that with some adjustments to use pgmap flags to check DENY_LONGTERM flag (and set it on fsdax[*]) and move the check to after try_grab_page(). That is provided the other alternative with special page bit isn't an option anymore. [*] which begs the question on whether fsdax is the *only* that needs the flag? > 3) Different refcounts for pud/pmd pages > > Ideally DAX cases would not do this (ie Joao is fixing device-dax) > but in the interm we can just loop over the PUD/PMD in all > cases. Looping is safe for THP AFAIK. I described how this can work > here: > > https://lore.kernel.org/all/20211013174140.GJ2744544@xxxxxxxxxx/ > > After that there are only two remaining uses: > > 4) The pud/pmd_devmap() in vm_normal_page() should just go > away. ZONE_DEVICE memory with struct pages SHOULD be a normal > page. This also means dropping pte_special too. > > 5) dev_pagemap_mapping_shift() - I don't know what this does > but why not use the is_zone_device_page() approach from 2? > dev_pagemap_mapping_shift() does a lookup to figure out which order is the page table entry represents. is_zone_device_page() is already used to gate usage of dev_pagemap_mapping_shift(). I think this might be an artifact of the same issue as 3) in which PMDs/PUDs are represented with base pages and hence you can't do what the rest of the world does with: tk->size_shift = page_shift(compound_head(p)); ... as page_shift() would just return PAGE_SHIFT (as compound_order() is 0).