On Mon, Aug 15, 2011 at 12:29:37PM -0700, Avi Kivity wrote: > On 08/12/2011 04:07 AM, Isaku Yamahata wrote: >> This is a character device to hook page access. >> The page fault in the area is reported to another user process by >> this chardriver. Then, the process fills the page contents and >> resolves the page fault. > > Have you considered CUSE (character device in userspace, fs/fuse/cuse.c)? By looking at dev.c and cuse.c, it doesn't seem to support mmap and fault handler. > >> index 55f5afb..623109e 100644 >> --- a/include/linux/kvm.h >> +++ b/include/linux/kvm.h >> @@ -554,6 +554,7 @@ struct kvm_ppc_pvinfo { >> #define KVM_CAP_PPC_SMT 64 >> #define KVM_CAP_PPC_RMA 65 >> #define KVM_CAP_MAX_VCPUS 66 /* returns max vcpus per vm */ >> +#define KVM_CAP_POST_COPY_MEMORY 67 >> >> #ifdef KVM_CAP_IRQ_ROUTING >> >> @@ -760,6 +761,50 @@ struct kvm_clock_data { >> /* Available with KVM_CAP_RMA */ >> #define KVM_ALLOCATE_RMA _IOR(KVMIO, 0xa9, struct kvm_allocate_rma) >> >> +struct kvm_vmem_create { >> + __u64 size; /* in bytes */ >> + __s32 vmem_fd; >> + __s32 shmem_fd; >> +}; > > Should really be outside kvm.h (and virt/kvm), since it's not kvm specific. Okay. I'll un-kvm it. >> + >> +struct kvm_vmem_page_request { >> + __u32 nr; >> + __u64 __user *pgoffs; >> +}; >> + >> +struct kvm_vmem_page_cached { >> + __u32 nr; >> + __u64 __user *pgoffs; >> +}; >> + >> +struct kvm_vmem_page_range { >> + __u64 pgoff; >> + __u64 nr_pages; >> +}; >> + >> +struct kvm_vmem_make_pages_present { >> + __u32 nr; >> + struct kvm_vmem_page_range __user *ranges; >> +}; > > This is madvise(MADV_WILLNEED), is it not? Another process, not qemu process, issues it, and it make the pages are present in qemu process address space. >> + >> +/* Available with KVM_CAP_POST_COPY_MEMORY */ >> +#define KVM_CREATE_VMEM_DEV _IO(KVMIO, 0xb0) >> + >> +/* ioctl for vmem_dev fd */ >> +#define KVM_CREATE_VMEM _IOR(KVMIO, 0xb1, __u32) >> + >> +/* ioctl for vmem fd */ >> +#define KVM_VMEM_WAIT_READY _IO(KVMIO, 0xb2) >> +#define KVM_VMEM_READY _IO(KVMIO, 0xb3) >> +#define KVM_VMEM_GET_PAGE_REQUEST \ >> + _IOWR(KVMIO, 0xb4, struct kvm_vmem_page_request) >> +#define KVM_VMEM_MARK_PAGE_CACHED \ >> + _IOW(KVMIO, 0xb5, struct kvm_vmem_page_cached) >> +#define KVM_VMEM_MAKE_PAGES_PRESENT \ >> + _IOW(KVMIO, 0xb6, struct kvm_vmem_make_pages_present) >> +#define KVM_VMEM_MAKE_VMA_ANONYMOUS _IO(KVMIO, 0xb7) > > Can you explain these in some more detail? KVM_CRATE_VMEM_DEV: create vmem-dev device from kvm device for qemu KVM_CREATE_VMEM: create vmem device from vmem-dev device. (note:qemu creates more than one memory region.) KVM_VMEM_WAIT_READY: wait for KVM_VMEM_READY for qemu KVM_VMEM_READY: unblock KVM_VMEM_WAIT_READY for daemon uses These are for qemu and daemon to synchronise to enter postcopy stage. KVM_VMEM_GET_PAGE_REQUEST: retrieve page fault of qemu process KVM_VMEM_MARK_PAGE_CACHED: mark the specified pages pulled from the source for daemon KVM_VMEM_MAKE_PAGES_PRESENT: make the specified pages present in qemu virtual address space for daemon uses KVM_VMEM_MAKE_VMA_ANONYMOUS: make the specified vma in the qemu process anonymous I'm not sure whether this can be implemented or not. I think The following the work flow on the destination helps. qemu on the destination | V open(/dev/kvm) | V KVM_CREATE_VMEM_DEV | V Here we have two file descriptors to vmem device and shmem file | | | daemon on the destination V fork()---------------------------------------, | | V | close(socket) V close(shmem) mmap(shmem file) | | V V mmap(vmem device) for guest RAM close(shmem file) | | V | KVM_VMEM_READY_WAIT <---------------------KVM_VMEM_READY | | V | close(vmem device) Here the daemon takes over | the owner of the socket entering post copy stage to the source start guest execution | | | V V access guest RAM KVM_VMEM_GET_PAGE_REQUEST | | V V page fault ------------------------------>page offset is returned block | V pull page from the source write the page contents to the shmem. | V unblock <-----------------------------KVM_VMEM_MARK_PAGE_CACHED the fault handler returns the page page fault is resolved | | pages can be pulled | backgroundly | | | V | KVM_VMEM_MARK_PAGE_CACHED | | V V The specified pages<----------------------KVM_VMEM_MAKE_PAGES_PRESENT are made present | so future page fault is avoided. | | | V V all the pages are pulled from the source | | V V the vma becomes anonymous<----------------KVM_VMEM_MAKE_VMA_ANONYMOUS (note: I'm not sure if this can be implemented or not) | | V V migration completes exit() thanks, -- yamahata -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html