On 08/15/2011 06:42 PM, Isaku Yamahata wrote:
On Mon, Aug 15, 2011 at 12:29:37PM -0700, Avi Kivity wrote: > On 08/12/2011 04:07 AM, Isaku Yamahata wrote: >> This is a character device to hook page access. >> The page fault in the area is reported to another user process by >> this chardriver. Then, the process fills the page contents and >> resolves the page fault. > > Have you considered CUSE (character device in userspace, fs/fuse/cuse.c)? By looking at dev.c and cuse.c, it doesn't seem to support mmap and fault handler.
If performance is sufficient, this would be the preferred path. Enhance an existing API which can be useful to others, rather than add a new one.
>> + >> +struct kvm_vmem_make_pages_present { >> + __u32 nr; >> + struct kvm_vmem_page_range __user *ranges; >> +}; > > This is madvise(MADV_WILLNEED), is it not? Another process, not qemu process, issues it, and it make the pages are present in qemu process address space.
That process just issues these calls in a loop until all memory is present, yes? it seems those few lines could be easily added to qemu.
> Can you explain these in some more detail? KVM_CRATE_VMEM_DEV: create vmem-dev device from kvm device for qemu KVM_CREATE_VMEM: create vmem device from vmem-dev device. (note:qemu creates more than one memory region.) KVM_VMEM_WAIT_READY: wait for KVM_VMEM_READY for qemu KVM_VMEM_READY: unblock KVM_VMEM_WAIT_READY for daemon uses These are for qemu and daemon to synchronise to enter postcopy stage.
This are eliminated if we fold the daemon into qemu. Also, could just a semaphore or other synchronization mechanism.
KVM_VMEM_GET_PAGE_REQUEST: retrieve page fault of qemu process
Equivalent to the fault callback of CUSE (if we add it)?
KVM_VMEM_MARK_PAGE_CACHED: mark the specified pages pulled from the source for daemon
Equivalent to returning from that callback with a new page?
KVM_VMEM_MAKE_PAGES_PRESENT: make the specified pages present in qemu virtual address space for daemon uses KVM_VMEM_MAKE_VMA_ANONYMOUS: make the specified vma in the qemu process anonymous I'm not sure whether this can be implemented or not. I think The following the work flow on the destination helps. qemu on the destination | V open(/dev/kvm) | V KVM_CREATE_VMEM_DEV | V Here we have two file descriptors to vmem device and shmem file | | | daemon on the destination V fork()---------------------------------------, | | V | close(socket) V close(shmem) mmap(shmem file) | | V V mmap(vmem device) for guest RAM close(shmem file) | | V | KVM_VMEM_READY_WAIT<---------------------KVM_VMEM_READY | | V | close(vmem device) Here the daemon takes over | the owner of the socket entering post copy stage to the source start guest execution | | | V V access guest RAM KVM_VMEM_GET_PAGE_REQUEST | | V V page fault ------------------------------>page offset is returned block | V pull page from the source write the page contents to the shmem. | V unblock<-----------------------------KVM_VMEM_MARK_PAGE_CACHED the fault handler returns the page page fault is resolved | | pages can be pulled | backgroundly | | | V | KVM_VMEM_MARK_PAGE_CACHED | | V V The specified pages<----------------------KVM_VMEM_MAKE_PAGES_PRESENT are made present | so future page fault is avoided. | | | V V all the pages are pulled from the source | | V V the vma becomes anonymous<----------------KVM_VMEM_MAKE_VMA_ANONYMOUS (note: I'm not sure if this can be implemented or not) | | V V migration completes exit()
Yes, thanks, this was very helpful. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html