On 13 January 2012 02:03, Isaku Yamahata <yamahata@xxxxxxxxxxxxx> wrote: > Very interesting. We can cooperate for better (postcopy) live migration. > The code doesn't seem available yet, I'm eager for it. > > > On Fri, Jan 13, 2012 at 01:09:30AM +0000, Benoit Hudzia wrote: >> Hi, >> >> Sorry to jump to hijack the thread like that , however i would like >> to just to inform you that we recently achieve a milestone out of the >> research project I'm leading. We enhanced KVM in order to deliver >> post copy live migration using RDMA at kernel level. >> >> Few point on the architecture of the system : >> >> * RDMA communication engine in kernel ( you can use soft iwarp or soft >> ROCE if you don't have hardware acceleration, however we also support >> standard RDMA enabled NIC) . > > Do you mean infiniband subsystem? Yes, basically any software or hardware implementation that support the standard RDMA / OFED vverbs stack in kernel. > > >> * Naturally Page are transferred with Zerop copy protocol >> * Leverage the async page fault system. >> * Pre paging / faulting >> * No context switch as everything is handled within kernel and using >> the page fault system. >> * Hybrid migration ( pre + post copy) available > > Ah, I've been also planing this. > After pre-copy phase, is the dirty bitmap sent? We sent over the dirty bitmap yes. In order to identify what is left to be transferred . And combined with the priority algo we will then prioritise the page for the background transfer. > > So far I've thought naively that pre-copy phase would be finished by the > number of iterations. On the other hand your choice is timeout of > pre-copy phase. Do you have rationale? or it was just natural for you? The main rational behind that is any normal sys admin tend to to be human and live migration iteration cycle has no meaning for him. As a result we preferred to provide a time constraint rather than an iteration constraint. Also it is hard to estimate how much time bandwidth would be use per iteration cycle which led to poor determinism. > > >> * Rely on an independent Kernel Module >> * No modification to the KVM kernel Module >> * Minimal Modification to the Qemu-Kvm code >> * We plan to add the page prioritization algo in order to optimise the >> pre paging algo and background transfer > > Where do you plan to implement? in qemu or in your kernel module? > This algo could be shared. Yes we plan to actually release the algo first before the RDMA post copy. The algo can be use for standard optimisation of the normal pre-copy process (as demosntrated in my talk at KVM-forum). And if the priority is reverse for the post copy page pull. My colleague Aidan shribman is done with the implentation and we are now in testing phase in order to quantify the improvement. > > thanks in advance. > >> You can learn a little bit more and see a demo here: >> http://tinyurl.com/8xa2bgl >> I hope to be able to provide more detail on the design soon. As well >> as more concrete demo of the system ( live migration of VM running >> large enterprise apps such as ERP or In memory DB) >> >> Note: this is just a step stone as the post copy live migration mainly >> enable us to validate the architecture design and code. >> >> Regards >> Benoit >> >> >> >> >> >> >> >> Regards >> Benoit >> >> >> On 12 January 2012 13:59, Avi Kivity <avi@xxxxxxxxxx> wrote: >> > On 01/04/2012 05:03 AM, Isaku Yamahata wrote: >> >> Yes, it's quite doable in user space(qemu) with a kernel-enhancement. >> >> And it would be easy to convert a separated daemon process into a thread >> >> in qemu. >> >> >> >> I think it should be done out side of qemu process for some reasons. >> >> (I just repeat same discussion at the KVM-forum because no one remembers >> >> it) >> >> >> >> - ptrace (and its variant) >> >> ?? Some people want to investigate guest ram on host (qemu stopped or lively). >> >> ?? For example, enhance crash utility and it will attach qemu process and >> >> ?? debug guest kernel. >> > >> > To debug the guest kernel you don't need to stop qemu itself. ?? I agree >> > it's a problem for qemu debugging though. >> > >> >> >> >> - core dump >> >> ?? qemu process may core-dump. >> >> ?? As postmortem analysis, people want to investigate guest RAM. >> >> ?? Again enhance crash utility and it will read the core file and analyze >> >> ?? guest kernel. >> >> ?? When creating core, the qemu process is already dead. >> > >> > Yes, strong point. >> > >> >> It precludes the above possibilities to handle fault in qemu process. >> > >> > I agree. >> > >> > >> > -- >> > error compiling committee.c: too many arguments to function >> > >> > -- >> > To unsubscribe from this list: send the line "unsubscribe kvm" in >> > the body of a message to majordomo@xxxxxxxxxxxxxxx >> > More majordomo info at ??http://vger.kernel.org/majordomo-info.html >> >> >> >> -- >> " The production of too many useful things results in too many useless people" >> > > -- > yamahata -- " The production of too many useful things results in too many useless people" -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html