Re: [RFC PATCH 0/2] mm: Add ability to monitor task's memory changes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>> For what is required for checkpoint-restore is -- we want to query the kernel
>> for "what pages has been written to since moment X". But this "moment X" is
>> a little bit more tricky than just "mark all pages r/o". Consider we're doing
>> this periodically. So when defining the moment X for the 2nd time we should
>> query the "changed" state and remap the respective page r/o atomically. Full
>> snapshot is actually not required, since we don't need to keep the old copy
>> of a page that is written to. Just a sign, that this page was modified is OK.
> 
> How is all this going to work, btw?  What is the interface to query
> page states and set them read-only?  How will dirty pagecache and dirty
> swapcache be handled?  And anonymous memory?

To begin with -- currently criu dumps lots of information about process by 
injecting a parasite code into the process [1] and working on the process
state as if it was this very process dumping himself.

That said, the proposed in this set API is about to be used like this:

1. A daemon is started, that turns tracing on, enables proposed mmu.* events
   and starts listening for them.
2. The parasite code gets injected into target task. This parasite knows
   which mapping(s) we're about to take to the image.
3. The parasite first sends the needed pages [2] to the image file.
4. Then parasite calls the proposed madvise(MADV_TRACE) on the mapping. When
   called, the respective mapping is marked with VM_TRACE bit and all the
   pages are remaped in ro.
5. After this parasite can be removed and the target task is continued.

If after this a process writes to some page the #PF occurs and the respective
event is send via tracing engine. Next time, when we want to take incremental
dump, we repeat steps 2 through 5, with a small change -- in step 3 parasite
requests the daemon from step 1 which pages has been changes since last time
and dumps only those into new image.

The state of swapcache (clean or dirty) doesn't matter in this case. If the
page is in swap and pte contains swap entry, we'll note this from pagemap file
and will take the page into image in the first pass. If later a process writes
to the page it will go through do_swap_page -> do_wp_page and the modification
event will be sent and caught by daemon from step 1.

The pagecache is completely out of the scope since criu doesn't dump the
contents of file mappings and doesn't snapshot filesystem state. It only
works with process' state. Filesystem state, that corresponds to process state
should be created with other means, e.g. lvm snapshot or rsync while tasks
are stopped. I've tried to explain this in more details here [3].


Thanks,
Pavel

[1] http://lwn.net/Articles/454304/
[2] Looking a the /proc/PID/pagemap file
[3] https://plus.google.com/103175467322423551911/posts/UAtVKaQcKsx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]