On Wed, Aug 10, 2011 at 04:55:32PM +0300, Avi Kivity wrote: > On 08/09/2011 05:33 AM, Isaku Yamahata wrote: >> On Mon, Aug 08, 2011 at 03:38:54PM +0300, Avi Kivity wrote: >> > On 08/08/2011 06:24 AM, Isaku Yamahata wrote: >> >> This mail is on "Yabusame: Postcopy Live Migration for Qemu/KVM" >> >> on which we'll give a talk at KVM-forum. >> >> The purpose of this mail is to letting developers know it in advance >> >> so that we can get better feedback on its design/implementation approach >> >> early before our starting to implement it. >> > >> > Interesting; what is the impact of increased latency on memory reads? >> >> Many people has already discussed it much in another thread. :-) >> That's much more than I expected. > > Can you point me to the discussion? I misunderstood of your question. Please refer to the papers which includes the evaluation results including network latency. It discusses about it in details. And the presentation that we will give at the KVM forum also includes some results. >> >> There are several design points. >> >> - who takes care of pulling page contents. >> >> an independent daemon vs a thread in qemu >> >> The daemon approach is preferable because an independent daemon would >> >> easy for debug postcopy memory mechanism without qemu. >> >> If required, it wouldn't be difficult to convert a daemon into >> >> a thread in qemu >> > >> > Isn't this equivalent to touching each page in sequence? >> >> No. I don't get your point of this question. > > If you have a qemu thread that does > > for (each guest page) > sum += *(char *)page; > > doesn't that effectively pull all pages from the source node? > > (but maybe I'm assuming that the kernel takes care of things and this > isn't the case?) Now I see your point. Right, it doesn't matter who starts the access to guest RAM. My point is, after the page fault, someone has to resolve the fault by sending the request for the page to the migration source. I think, daemon or thread isn't a big issue anyway. If nbd with swap device is used, its IO request may be sent to the source directly. >> >> - hooking guest RAM access >> >> Introduce a character device to handle page fault. >> >> When page fault occurs, it queues page request up to user space daemon >> >> at the destination. And the daemon pulls page contents from the source >> >> and serves it into the character device. Then the page fault is resovlved. >> > >> > This doesn't play well with host swapping, transparent hugepages, or >> > ksm, does it? >> >> No. At least it wouldn't be so difficult to fix it, I haven't looked ksm, >> thp so closely though. >> Although the vma is backed by the device, the populated page is >> anonymous. (by MMAP_PRIVATE or the deriver returning anonymous page) >> So swapping, thp, ksm should work. > > I'm not 100% sure, but I think that thp and ksm need the vma to be > anonymous, not just the page. Yes, they seems to check if not only the page is anonymous, but also the vma. I'd like to hear from Andrea before digging into the code deeply. >> > It would need to be a special kind of swap device since we only want to >> > swap in, and never out, to that device. We'd also need a special way of >> > telling the kernel that memory comes from that device. In that it's >> > similar your second option. >> > >> > Maybe we should use a backing file (using nbd) and have a madvise() call >> > that converts the vma to anonymous memory once the migration is finished. >> >> With whichever options, I'd like to convert the vma into anonymous area >> after the migration completes somehow. i.e. nulling vma->vm_ops. >> (The pages are already anonymous.) >> >> It seems troublesome involving complicated races/lockings. So I'm not sure >> it's worthwhile. > > Andrea, what's your take on this? I'd also like to hear from those who are familiar with ksm/thp. If it is possible to convert the vma into anonymous, swap device or backed by device/file wouldn't matter in respect to ksm and thp. Acquiring mmap_sem suffices? thanks, -- yamahata -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html