On Wed, Jul 21, 2021 4:20 PM +0000, David Hildenbrand wrote: > On 21.07.21 16:38, Ivan Teterevkov wrote: > > On Mon, Jul 19, 2021 5:56 PM +0000, Peter Xu wrote: > >> I'm also curious what would be the real use to have an accurate > >> PM_SWAP accounting. To me current implementation may not provide > >> accurate value but should be good enough for most cases. However not > >> sure whether it's also true for your use case. > > > > We want the PM_SWAP bit implemented (for shared memory in the pagemap > > interface) to enhance the live migration for some fraction of the > > guest VMs that have their pages swapped out to the host swap. Once > > those pages are paged in and transferred over network, we then want to > > release them with madvise(MADV_PAGEOUT) and preserve the working set > > of the guest VMs to reduce the thrashing of the host swap. > > There are 3 possibilities I think (swap is just another variant of the page cache): > > 1) The page is not in the page cache, e.g., it resides on disk or in a swap file. > pte_none(). > 2) The page is in the page cache and is not mapped into the page table. > pte_none(). > 3) The page is in the page cache and mapped into the page table. > !pte_none(). > > Do I understand correctly that you want to identify 1) and indicate it via > PM_SWAP? Yes, and I also want to outline the context so we're on the same page. This series introduces the support for userfaultfd-wp for shared memory because once a shared page is swapped, its PTE is cleared. Upon retrieval from a swap file, there's no way to "recover" the _PAGE_SWP_UFFD_WP flag because unlike private memory it's not kept in PTE or elsewhere. We came across the same issue with PM_SWAP in the pagemap interface, but fortunately, there's the place that we could query: the i_pages field of the struct address_space (XArray). In https://lkml.org/lkml/2021/7/14/595 we do it similarly to what shmem_fault() does when it handles #PF. Now, in the context of this series, we were exploring whether it makes any practical sense to introduce more brand new flags to the special PTE to populate the pagemap flags "on the spot" from the given PTE. However, I can't see how (and why) to achieve that specifically for PM_SWAP even with an extra bit: the XArray is precisely what we need for the live migration use case. Another flag PM_SOFT_DIRTY suffers the same problem as UFFD_WP_SWP_PTE_SPECIAL before this patch series, but we don't need it at the moment. Hope that clarification makes sense? The only outstanding note I have is about the compatibility of our patches around pte_to_pagemap_entry(). I think the resulting code should look like this: static pagemap_entry_t pte_to_pagemap_entry(...) { if (pte_present(pte)) { ... } else if (is_swap_pte(pte) || shmem_file(vma->vm_file)) { ... if (pte_swp_uffd_wp_special(pte)) { flags |= PM_UFFD_WP; } } } The is_swap_pte() branch will be taken for the swapped out shared pages, thanks to shmem_file(), so the pte_swp_uffd_wp_special() can be checked inside. Alternatively, we could just remove "else" statement: static pagemap_entry_t pte_to_pagemap_entry(...) { if (pte_present(pte)) { ... } else if (is_swap_pte(pte) || shmem_file(vma->vm_file)) { ... } if (pte_swp_uffd_wp_special(pte)) { flags |= PM_UFFD_WP; } } What do you reckon? Thanks, Ivan