On 02/10/19 21:27, Jerome Glisse wrote: > On Tue, Sep 10, 2019 at 07:49:51AM +0000, Mircea CIRJALIU - MELIU wrote: >>> On 05/09/19 20:09, Jerome Glisse wrote: >>>> Not sure i understand, you are saying that the solution i outline >>>> above does not work ? If so then i think you are wrong, in the above >>>> solution the importing process mmap a device file and the resulting >>>> vma is then populated using insert_pfn() and constantly keep >>>> synchronize with the target process through mirroring which means that >>>> you never have to look at the struct page ... you can mirror any kind >>>> of memory from the remote process. >>> >>> If insert_pfn in turn calls MMU notifiers for the target VMA (which would be >>> the KVM MMU notifier), then that would work. Though I guess it would be >>> possible to call MMU notifier update callbacks around the call to insert_pfn. >> >> Can't do that. >> First, insert_pfn() uses set_pte_at() which won't trigger the MMU notifier on >> the target VMA. It's also static, so I'll have to access it thru vmf_insert_pfn() >> or vmf_insert_mixed(). > > Why would you need to target mmu notifier on target vma ? If the mapping of the source VMA changes, mirroring can update the target VMA via insert_pfn. But what ensures that KVM's MMU notifier dismantles its own existing page tables (so that they can be recreated with the new mapping from the source VMA)? Thanks, Paolo > You do not need > that. The workflow is: > > userspace: > ptr = mmap(/dev/kvm-mirroring-device, virtual_addresse_of_target) > > Then when the mirroring process access ptr it triggers page fault that > endup in the vm_operation_struct->fault() which is just doing: > > kernel-kvm-mirroring-function: > kvm_mirror_page_fault(struct vm_fault *vmf) { > struct kvm_mirror_struct *kvmms; > > kvmms = kvm_mirror_struct_from_file(vmf->vma->vm_file); > ... > again: > hmm_range_register(&range); > hmm_range_snapshot(&range); > take_lock(kvmms->update); > if (!hmm_range_valid(&range)) { > vm_insert_pfn(); > drop_lock(kvmms->update); > hmm_range_unregister(&range); > return VM_FAULT_NOPAGE; > } > drop_lock(kvmms->update); > goto again; > } > > The notifier callback: > kvmms_notifier_start() { > take_lock(kvmms->update); > clear_pte(start, end); > drop_lock(kvmms->update); > } > >> >> Our model (the importing process is encapsulated in another VM) forces us >> to mirror certain pages from the anon VMA backing one VM's system RAM to >> the other VM's anon VMA. > > The mirror does not have to be an anon vma it can very well be a > device vma ie mmap of a device file. I do not see any reasons why > the mirror need to be an anon vma. Please explain why. > >> >> Using the functions above means setting VM_PFNMAP|VM_MIXEDMAP on >> the target anon VMA, but I guess this breaks the VMA. Is this recommended? > > The mirror vma should not be an anon vma. > >> >> Then, mapping anon pages from one VMA to another without fixing the >> refcount and the mapcount breaks the daemons that think they're working >> on a pure anon VMA (kcompactd, khugepaged). > > Note here the target vma ie the mirroring one is a mmap of device file > and thus is skip by all of the above (kcompactd, khugepaged, ...) it is > fully ignore by core mm. > > Thus you do not need to fix the refcount in any way. If any of the core > mm try to reclaim memory from the original vma then you will get mmu > notifier callbacks and all you have to do is clear the page table of your > device vma. > > I did exactly that as a tools in the past and it works just fine with > no change to core mm whatsoever. > > Cheers, > Jérôme >