On Dec 31, 2013, at 4:23 AM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: > On Tue, Dec 17, 2013 at 01:59:04PM +0800, Xiao Guangrong wrote: >> >> CCed KVM guys. >> >> On 05/10/2013 01:11 PM, Stefan Hajnoczi wrote: >>> On Fri, May 10, 2013 at 4:28 AM, wenchao <wenchaolinux@xxxxxxxxx> wrote: >>>> 于 2013-5-9 22:13, Mel Gorman 写道: >>>> >>>>> On Thu, May 09, 2013 at 05:50:05PM +0800, wenchaolinux@xxxxxxxxx wrote: >>>>>> >>>>>> From: Wenchao Xia <wenchaolinux@xxxxxxxxx> >>>>>> >>>>>> This serial try to enable mremap syscall to cow some private memory >>>>>> region, >>>>>> just like what fork() did. As a result, user space application would got >>>>>> a >>>>>> mirror of those region, and it can be used as a snapshot for further >>>>>> processing. >>>>>> >>>>> >>>>> What not just fork()? Even if the application was threaded it should be >>>>> managable to handle fork just for processing the private memory region >>>>> in question. I'm having trouble figuring out what sort of application >>>>> would require an interface like this. >>>>> >>>> It have some troubles: parent - child communication, sometimes >>>> page copy. >>>> I'd like to snapshot qemu guest's RAM, currently solution is: >>>> 1) fork() >>>> 2) pipe guest RAM data from child to parent. >>>> 3) parent write down the contents. >>>> >>>> To avoid complex communication for data control, and file content >>>> protecting, So let parent instead of child handling the data with >>>> a pipe, but this brings additional copy(). I think an explicit API >>>> cow mapping an memory region inside one process, could avoid it, >>>> and faster and cow less pages, also make user space code nicer. >>> >>> A new Linux-specific API is not portable and not available on existing >>> hosts. Since QEMU supports non-Linux host operating systems the >>> fork() approach is preferable. >>> >>> If you're worried about the memory copy - which should be benchmarked >>> - then vmsplice(2) can be used in the child process and splice(2) can >>> be used in the parent. It probably doesn't help though since QEMU >>> scans RAM pages to find all-zero pages before sending them over the >>> socket, and at that point the memory copy might not make much >>> difference. >>> >>> Perhaps other applications can use this new flag better, but for QEMU >>> I think fork()'s portability is more important than the convenience of >>> accessing the CoW pages in the same process. >> >> Yup, I agree with you that the new syscall sometimes is not a good solution. >> >> Currently, we're working on live-update[1] that will be enabled on Qemu firstly, >> this feature let the guest run on the new Qemu binary smoothly without >> restart, it's good for us to do security-update. >> >> In this case, we need to move the guest memory on old qemu instance to the >> new one, fork() can not help because we need to exec() a new instance, after >> that all memory mapping will be destroyed. >> >> We tried to enable SPLICE_F_MOVE[2] for vmsplice() to move the memory without >> memory-copy but the performance isn't so good as we expected: it's due to >> some limitations: the page-size, lock, message-size limitation on pipe, etc. >> Of course, we will continue to improve this, but wenchao's patch seems a new >> direction for us. >> >> To coordinate with your fork() approach, maybe we can introduce a new flag >> for VMA, something like: VM_KEEP_ONEXEC, to tell exec() to do not destroy >> this VMA. How about this or you guy have new idea? Really appreciate for your >> suggestion. >> >> [1] http://marc.info/?l=qemu-devel&m=138597598700844&w=2 >> [2] https://lkml.org/lkml/2013/10/25/285 > > Hi, > Hi Marcelo, > What is the purpose of snapshotting guest RAM here, in the context of > local migration? RAM-shapshotting and local-migration are on the different ways. Why i asked for your guy’s suggestion here is beacuse i thought they need do a same thing that moves memory from one process to another in a efficient way. Your idea? :) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html