> > I've spoken with Stephen Hemminger, and he said that DPDK is moving in > > the direction of using transparent huge pages instead of HugeTLBs, > > which means that we need to allow at least anonymous, and anonymous > > transparent huge pages to come from non-movable zones on demand. > > > > I'd like to know more about this use case, ZONE_MOVABLE is typically a > great way to optimize for thp availability because, absent memory pinning, > this memory can always be defragmented. So the idea is that DPDK will now > allocate all of its thp from ZONE_NORMAL or only a small subset? Seems > like an invitation for oom kill if the sizing of ZONE_NORMAL is > insufficient. The idea is to allocate only those THP and anon pages that are long term pinned from ZONE_NORMAL, the rest can still be allocated from ZONE_MOVABLE. > > > Here is what I am proposing: > > 1. Add a new flag that is passed through pin_user_pages_* down to > > fault handlers, and allow the fault handler to allocate from a > > non-movable zone. > > > > Sample function stacks through which this info needs to be passed is this: > > > > pin_user_pages_remote(gup_flags) > > __get_user_pages_remote(gup_flags) > > __gup_longterm_locked(gup_flags) > > __get_user_pages_locked(gup_flags) > > __get_user_pages(gup_flags) > > faultin_page(gup_flags) > > Convert gup_flags into fault_flags > > handle_mm_fault(fault_flags) > > > > From handle_mm_fault(), the stack diverges into various faults, > > examples include: > > > > Transparent Huge Page > > handle_mm_fault(fault_flags) > > __handle_mm_fault(fault_flags) > > Create: struct vm_fault vmf, use fault_flags to specify correct gfp_mask > > create_huge_pmd(vmf); > > do_huge_pmd_anonymous_page(vmf); > > mm_get_huge_zero_page(vma->vm_mm); -> flag is lost, so flag from > > vmf.gfp_mask should be passed as well. > > > > There are several other similar paths in a transparent huge page, also > > there is a named path where allocation is based on filesystems, and > > the flag should be honored there as well, but it does not have to be > > added at the same time. > > > > Regular Pages > > handle_mm_fault(fault_flags) > > __handle_mm_fault(fault_flags) > > Create: struct vm_fault vmf, use fault_flags to specify correct gfp_mask > > handle_pte_fault(vmf) > > do_anonymous_page(vmf); > > page = alloc_zeroed_user_highpage_movable(vma, vmf->address); -> > > replace change this call according to gfp_mask. > > > > This would likely be useful for AMD SEV as well, which requires guest > pages to be pinned because the encryption algorithm depends on the host > physical address. This ensures that plaintext memory for two pages don't > result in the same ciphertext.