>> For what is required for checkpoint-restore is -- we want to query the kernel >> for "what pages has been written to since moment X". But this "moment X" is >> a little bit more tricky than just "mark all pages r/o". Consider we're doing >> this periodically. So when defining the moment X for the 2nd time we should >> query the "changed" state and remap the respective page r/o atomically. Full >> snapshot is actually not required, since we don't need to keep the old copy >> of a page that is written to. Just a sign, that this page was modified is OK. > > How is all this going to work, btw? What is the interface to query > page states and set them read-only? How will dirty pagecache and dirty > swapcache be handled? And anonymous memory? To begin with -- currently criu dumps lots of information about process by injecting a parasite code into the process [1] and working on the process state as if it was this very process dumping himself. That said, the proposed in this set API is about to be used like this: 1. A daemon is started, that turns tracing on, enables proposed mmu.* events and starts listening for them. 2. The parasite code gets injected into target task. This parasite knows which mapping(s) we're about to take to the image. 3. The parasite first sends the needed pages [2] to the image file. 4. Then parasite calls the proposed madvise(MADV_TRACE) on the mapping. When called, the respective mapping is marked with VM_TRACE bit and all the pages are remaped in ro. 5. After this parasite can be removed and the target task is continued. If after this a process writes to some page the #PF occurs and the respective event is send via tracing engine. Next time, when we want to take incremental dump, we repeat steps 2 through 5, with a small change -- in step 3 parasite requests the daemon from step 1 which pages has been changes since last time and dumps only those into new image. The state of swapcache (clean or dirty) doesn't matter in this case. If the page is in swap and pte contains swap entry, we'll note this from pagemap file and will take the page into image in the first pass. If later a process writes to the page it will go through do_swap_page -> do_wp_page and the modification event will be sent and caught by daemon from step 1. The pagecache is completely out of the scope since criu doesn't dump the contents of file mappings and doesn't snapshot filesystem state. It only works with process' state. Filesystem state, that corresponds to process state should be created with other means, e.g. lvm snapshot or rsync while tasks are stopped. I've tried to explain this in more details here [3]. Thanks, Pavel [1] http://lwn.net/Articles/454304/ [2] Looking a the /proc/PID/pagemap file [3] https://plus.google.com/103175467322423551911/posts/UAtVKaQcKsx -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>