On Tue, Sep 10, 2019 at 07:49:51AM +0000, Mircea CIRJALIU - MELIU wrote: > > On 05/09/19 20:09, Jerome Glisse wrote: > > > Not sure i understand, you are saying that the solution i outline > > > above does not work ? If so then i think you are wrong, in the above > > > solution the importing process mmap a device file and the resulting > > > vma is then populated using insert_pfn() and constantly keep > > > synchronize with the target process through mirroring which means that > > > you never have to look at the struct page ... you can mirror any kind > > > of memory from the remote process. > > > > If insert_pfn in turn calls MMU notifiers for the target VMA (which would be > > the KVM MMU notifier), then that would work. Though I guess it would be > > possible to call MMU notifier update callbacks around the call to insert_pfn. > > Can't do that. > First, insert_pfn() uses set_pte_at() which won't trigger the MMU notifier on > the target VMA. It's also static, so I'll have to access it thru vmf_insert_pfn() > or vmf_insert_mixed(). Why would you need to target mmu notifier on target vma ? You do not need that. The workflow is: userspace: ptr = mmap(/dev/kvm-mirroring-device, virtual_addresse_of_target) Then when the mirroring process access ptr it triggers page fault that endup in the vm_operation_struct->fault() which is just doing: kernel-kvm-mirroring-function: kvm_mirror_page_fault(struct vm_fault *vmf) { struct kvm_mirror_struct *kvmms; kvmms = kvm_mirror_struct_from_file(vmf->vma->vm_file); ... again: hmm_range_register(&range); hmm_range_snapshot(&range); take_lock(kvmms->update); if (!hmm_range_valid(&range)) { vm_insert_pfn(); drop_lock(kvmms->update); hmm_range_unregister(&range); return VM_FAULT_NOPAGE; } drop_lock(kvmms->update); goto again; } The notifier callback: kvmms_notifier_start() { take_lock(kvmms->update); clear_pte(start, end); drop_lock(kvmms->update); } > > Our model (the importing process is encapsulated in another VM) forces us > to mirror certain pages from the anon VMA backing one VM's system RAM to > the other VM's anon VMA. The mirror does not have to be an anon vma it can very well be a device vma ie mmap of a device file. I do not see any reasons why the mirror need to be an anon vma. Please explain why. > > Using the functions above means setting VM_PFNMAP|VM_MIXEDMAP on > the target anon VMA, but I guess this breaks the VMA. Is this recommended? The mirror vma should not be an anon vma. > > Then, mapping anon pages from one VMA to another without fixing the > refcount and the mapcount breaks the daemons that think they're working > on a pure anon VMA (kcompactd, khugepaged). Note here the target vma ie the mirroring one is a mmap of device file and thus is skip by all of the above (kcompactd, khugepaged, ...) it is fully ignore by core mm. Thus you do not need to fix the refcount in any way. If any of the core mm try to reclaim memory from the original vma then you will get mmu notifier callbacks and all you have to do is clear the page table of your device vma. I did exactly that as a tools in the past and it works just fine with no change to core mm whatsoever. Cheers, Jérôme