On Mon, Aug 08, 2011 at 03:38:54PM +0300, Avi Kivity wrote: > On 08/08/2011 06:24 AM, Isaku Yamahata wrote: >> This mail is on "Yabusame: Postcopy Live Migration for Qemu/KVM" >> on which we'll give a talk at KVM-forum. >> The purpose of this mail is to letting developers know it in advance >> so that we can get better feedback on its design/implementation approach >> early before our starting to implement it. > > Interesting; what is the impact of increased latency on memory reads? Many people has already discussed it much in another thread. :-) That's much more than I expected. >> There are several design points. >> - who takes care of pulling page contents. >> an independent daemon vs a thread in qemu >> The daemon approach is preferable because an independent daemon would >> easy for debug postcopy memory mechanism without qemu. >> If required, it wouldn't be difficult to convert a daemon into >> a thread in qemu > > Isn't this equivalent to touching each page in sequence? No. I don't get your point of this question. > Care must be taken that we don't post too many requests, or it could > affect the latency of synchronous accesses by the guest. Yes. >> - connection between the source and the destination >> The connection for live migration can be re-used after sending machine >> state. >> >> - transfer protocol >> The existing protocol that exists today can be extended. >> >> - hooking guest RAM access >> Introduce a character device to handle page fault. >> When page fault occurs, it queues page request up to user space daemon >> at the destination. And the daemon pulls page contents from the source >> and serves it into the character device. Then the page fault is resovlved. > > This doesn't play well with host swapping, transparent hugepages, or > ksm, does it? No. At least it wouldn't be so difficult to fix it, I haven't looked ksm, thp so closely though. Although the vma is backed by the device, the populated page is anonymous. (by MMAP_PRIVATE or the deriver returning anonymous page) So swapping, thp, ksm should work. > I see you note this later on. > >> * More on hooking guest RAM access >> There are several candidate for the implementation. Our preference is >> character device approach. >> >> - inserting hooks into everywhere in qemu/kvm >> This is impractical >> >> - backing store for guest ram >> a block device or a file can be used to back guest RAM. >> Thus hook the guest ram access. >> >> pros >> - new device driver isn't needed. >> cons >> - future improvement would be difficult >> - some KVM host feature(KSM, THP) wouldn't work >> >> - character device >> qemu mmap() the dedicated character device, and then hook page fault. >> >> pros >> - straght forward approach >> - future improvement would be easy >> cons >> - new driver is needed >> - some KVM host feature(KSM, THP) wouldn't work >> They checks if a given VMA is anonymous. This can be fixed. >> >> - swap device >> When creating guest, it is set up as if all the guest RAM is swapped out >> to a dedicated swap device, which may be nbd disk (or some kind of user >> space block device, BUSE?). >> When the VM tries to access memory, swap-in is triggered and IO to the >> swap device is issued. Then the IO to swap is routed to the daemon >> in user space with nbd protocol (or BUSE, AOE, iSCSI...). The daemon pulls >> pages from the migration source and services the IO request. >> >> pros >> - After the page transfer is complete, everything is same as normal case. >> - no new device driver isn't needed >> cons >> - future improvement would be difficult >> - administration: setting up nbd, swap device >> > > Using a swap device would be my preference. We'd still be using > anonymous memory so thp/ksm/ordinary swap still work. > > It would need to be a special kind of swap device since we only want to > swap in, and never out, to that device. We'd also need a special way of > telling the kernel that memory comes from that device. In that it's > similar your second option. > > Maybe we should use a backing file (using nbd) and have a madvise() call > that converts the vma to anonymous memory once the migration is finished. With whichever options, I'd like to convert the vma into anonymous area after the migration completes somehow. i.e. nulling vma->vm_ops. (The pages are already anonymous.) It seems troublesome involving complicated races/lockings. So I'm not sure it's worthwhile. -- yamahata -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html