On Thu, Oct 1, 2020 at 11:17 AM Ralph Campbell <rcampbell@xxxxxxxxxx> wrote: > > This is still an RFC because after looking at the pmem/dax code some > more, I realized that the ZONE_DEVICE struct pages are being inserted > into the process' page tables with vmf_insert_mixed() and a zero > refcount on the ZONE_DEVICE struct page. This is sort of OK because > insert_pfn() increments the reference count on the pgmap which is what > prevents memunmap_pages() from freeing the struct pages and it doesn't > check for a non-zero struct page reference count. > But, any calls to get_page() will hit the VM_BUG_ON_PAGE() that > checks for a reference count == 0. > > // mmap() an ext4 file that is mounted -o dax. > ext4_dax_fault() > ext4_dax_huge_fault() > dax_iomap_fault(&ext4_iomap_ops) > dax_iomap_pte_fault() > ops->iomap_begin() // ext4_iomap_begin() > ext4_map_blocks() > ext4_set_iomap() > dax_iomap_pfn() > dax_insert_entry() > vmf_insert_mixed(pfn) > __vm_insert_mixed() > if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && > !pfn_t_devmap(pfn) && pfn_t_valid(pfn)) > insert_page() > get_page(page) // XXX would trigger VM_BUG_ON_PAGE() > page_add_file_rmap() > set_pte_at() > else > insert_pfn() > pte_mkdevmap() > set_pte_at() > > Should pmem set the page reference count to one before inserting the > pfn into the page tables (and decrement when removing devmap PTEs)? > What about MEMORY_DEVICE_GENERIC and MEMORY_DEVICE_PCI_P2PDMA use cases? > Where should they icrement/decrement the page reference count? > I don't know enough about how these are used to really know what to > do at this point. If people want me to continue to work on this series, > I will need some guidance. fs/dax could take the reference when inserting, but that would mean that ext4 and xfs would need to go back to checking for 1 to be page idle. I think that's ok because the filesystem is actually not checking for page-idle it's checking for "get_user_pages()" idle.