On 25/09/20 19:30, Ben Gardon wrote: > Oh, thank you for explaining that. I didn't realize the goal here was > to improve LM performance. I was under the impression that this was to > give VMs a better experience on startup for fast scaling or something. > In your testing with live migration how has this affected the > distribution of time between the phases of live migration? Just for > terminology (since I'm not sure how standard it is across the > industry) I think of a live migration as consisting of 3 stages: > precopy, blackout, and postcopy. In precopy we're tracking the VM's > working set via dirty logging and sending the contents of its memory > to the target host. In blackout we pause the vCPUs on the source, copy > minimal data to the target, and resume the vCPUs on the target. In > postcopy we may still have some pages that have not been copied to the > target and so request those in response to vCPU page faults via user > fault fd or some other mechanism. > > Does EPT pre-population preclude the use of a postcopy phase? I think so. As a quick recap, turn postcopy migration handles two kinds of pages---they can be copied to the destination either in background (stuff that was dirty when userspace decided to transition to the blackout phase) or on-demand (relayed from KVM to userspace via get_user_pages and userfaultfd). Normally only on-demand pages would be served through userfaultfd, while with prepopulation every missing page would be faulted in from the kernel through userfaultfd. In practice this would just extend the blackout phase. Paolo > I would > expect that to make the blackout phase really long. Has that not been > a problem for you? > > I love the idea of partial EPT pre-population during precopy if you > could still handle postcopy and just pre-populate as memory came in. >