On 13 January 2012 02:15, Isaku Yamahata <yamahata@xxxxxxxxxxxxx> wrote: > One more question. > Does your architecture/implementation (in theory) allow KVM memory > features like swap, KSM, THP? * Swap: Yes we support swap to disk ( the page is pulled from swap before being send over), swap process do its job on the other side. * KSM : same , we support KSM, the KSMed page is broken down and split and they are send individually ( yes sub optimal but make the protocol less messy) and we let the KSM daemon do its job on the other side. * THP : more sticky here. Due to time constraint we decided that we will be partially supporting it. What does it means: if we encounter THP we break them down in standard page granularity as it is our current memory unit we are manipulating. As a result you can have THP on the source but you won't have THP on the other side. _ Note , we didn't explore fully the ramification of THP with RDMA, i don't know if THP play well with the MMU of HW RDMA NIC, One thing i would like to explore is if it is possible to break down the THP in standard page and then reassemble them on the other side ( do any one fo you know if it is possible to aggregate page to for a THP in kernel ? ) * cgroup : should be transparently working, but we need to do more testing to confirm that . > > > On Fri, Jan 13, 2012 at 11:03:23AM +0900, Isaku Yamahata wrote: >> Very interesting. We can cooperate for better (postcopy) live migration. >> The code doesn't seem available yet, I'm eager for it. >> >> >> On Fri, Jan 13, 2012 at 01:09:30AM +0000, Benoit Hudzia wrote: >> > Hi, >> > >> > Sorry to jump to hijack the thread like that , however i would like >> > to just to inform you that we recently achieve a milestone out of the >> > research project I'm leading. We enhanced KVM in order to deliver >> > post copy live migration using RDMA at kernel level. >> > >> > Few point on the architecture of the system : >> > >> > * RDMA communication engine in kernel ( you can use soft iwarp or soft >> > ROCE if you don't have hardware acceleration, however we also support >> > standard RDMA enabled NIC) . >> >> Do you mean infiniband subsystem? >> >> >> > * Naturally Page are transferred with Zerop copy protocol >> > * Leverage the async page fault system. >> > * Pre paging / faulting >> > * No context switch as everything is handled within kernel and using >> > the page fault system. >> > * Hybrid migration ( pre + post copy) available >> >> Ah, I've been also planing this. >> After pre-copy phase, is the dirty bitmap sent? >> >> So far I've thought naively that pre-copy phase would be finished by the >> number of iterations. On the other hand your choice is timeout of >> pre-copy phase. Do you have rationale? or it was just natural for you? >> >> >> > * Rely on an independent Kernel Module >> > * No modification to the KVM kernel Module >> > * Minimal Modification to the Qemu-Kvm code >> > * We plan to add the page prioritization algo in order to optimise the >> > pre paging algo and background transfer >> >> Where do you plan to implement? in qemu or in your kernel module? >> This algo could be shared. >> >> thanks in advance. >> >> > You can learn a little bit more and see a demo here: >> > http://tinyurl.com/8xa2bgl >> > I hope to be able to provide more detail on the design soon. As well >> > as more concrete demo of the system ( live migration of VM running >> > large enterprise apps such as ERP or In memory DB) >> > >> > Note: this is just a step stone as the post copy live migration mainly >> > enable us to validate the architecture design and code. >> > >> > Regards >> > Benoit >> > >> > >> > >> > >> > >> > >> > >> > Regards >> > Benoit >> > >> > >> > On 12 January 2012 13:59, Avi Kivity <avi@xxxxxxxxxx> wrote: >> > > On 01/04/2012 05:03 AM, Isaku Yamahata wrote: >> > >> Yes, it's quite doable in user space(qemu) with a kernel-enhancement. >> > >> And it would be easy to convert a separated daemon process into a thread >> > >> in qemu. >> > >> >> > >> I think it should be done out side of qemu process for some reasons. >> > >> (I just repeat same discussion at the KVM-forum because no one remembers >> > >> it) >> > >> >> > >> - ptrace (and its variant) >> > >> ?? Some people want to investigate guest ram on host (qemu stopped or lively). >> > >> ?? For example, enhance crash utility and it will attach qemu process and >> > >> ?? debug guest kernel. >> > > >> > > To debug the guest kernel you don't need to stop qemu itself. ?? I agree >> > > it's a problem for qemu debugging though. >> > > >> > >> >> > >> - core dump >> > >> ?? qemu process may core-dump. >> > >> ?? As postmortem analysis, people want to investigate guest RAM. >> > >> ?? Again enhance crash utility and it will read the core file and analyze >> > >> ?? guest kernel. >> > >> ?? When creating core, the qemu process is already dead. >> > > >> > > Yes, strong point. >> > > >> > >> It precludes the above possibilities to handle fault in qemu process. >> > > >> > > I agree. >> > > >> > > >> > > -- >> > > error compiling committee.c: too many arguments to function >> > > >> > > -- >> > > To unsubscribe from this list: send the line "unsubscribe kvm" in >> > > the body of a message to majordomo@xxxxxxxxxxxxxxx >> > > More majordomo info at ??http://vger.kernel.org/majordomo-info.html >> > >> > >> > >> > -- >> > " The production of too many useful things results in too many useless people" >> > >> >> -- >> yamahata >> > > -- > yamahata -- " The production of too many useful things results in too many useless people" -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html