Hi Christopher, On Thu, Jul 03, 2014 at 09:45:07AM -0400, Christopher Covington wrote: > CRIU uses the soft dirty bit in /proc/pid/clear_refs and /proc/pid/pagemap to > implement its pre-copy memory migration. > > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/vm/soft-dirty.txt > > Would it make sense to use a similar interaction model of peeking and poking > at /proc/pid/ files for post-copy memory migration facilities? We plan to use the pagemap information to optimize precopy live migration, but that's orthogonal with postcopy live migration. We already combine precopy and postcopy live migration. In addition to the dirty bit tracking with softdirty clear_refs feature, the pagemap bits can also tell for example which pages are missing in the source node, instead of the current memcmp(0) that avoids to transfer zero pages. With pagemap we can skip a superfluous zero page fault (David suggested this). Postcopy live migration poses a different problem. And without postcopy there's no way to migrate 100GByte guests with heavy load inside them, in fact even the first "optimistic" precopy pass should only migrate those pages that already got the dirty bit set by the time we attempt to send them. With postcopy we can also guarantee that the maximum amount of data transferred during precopy+postcopy is twice the size of the guest. So you know exactly the maximum time live migration will take depending on your network bandwidth and it cannot fail no matter the load or the size of the guest. Slowing down the guest with autoconverge isn't needed anymore. The userfault only happens in the destination node. The problem we face is that we must start the guest in the destination node despite significant amount of its memory is still in the source node. With postcopy migration the pages aren't dirty nor present in the destination node, they're just holes, and in fact we already exactly know which are missing without having to check pagemap. It's up to the guest OS which pages it decides to touch, we cannot know. We already know where are holes, we don't know if the guest will touch the holes during its runtime while the memory is still externalized. If the guest touches any hole we need to stop the guest somehow and we must be let know immediately so we transfer the page, fill the hole, and let it continue ASAP. pagemap/clear_refs can't stop the guest and let us know immediately about the fact the guest touched a hole. It's not just about the guest shadow mmu accesses, it could also be O_DIRECT from qemu that triggers the fault and in that case GUP stops, we fill the hole and then GUP and O_DIRECT succeeds without even noticing it has been stopped by an userfault. Thanks, Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>