Re: [RFC PATCH 0/2] mm: Add ability to monitor task's memory changes

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Mon, 3 Dec 2012 14:43:10 -0800

On Fri, 30 Nov 2012 21:55:00 +0400
Pavel Emelyanov <xemul@xxxxxxxxxxxxx> wrote:

> This is an attempt to implement support for memory snapshot for the the
> checkpoint-restore project (http://criu.org).
> 
> To create a dump of an application(s) we save all the information about it
> to files. No surprise, the biggest part of such dump is the contents of tasks'
> memory. However, in some usage scenarios it's not required to get _all_ the
> task memory while creating a dump. For example, when doing periodical dumps
> it's only required to take full memory dump only at the first step and then
> take incremental changes of memory. Another example is live migration. In the
> simplest form it looks like -- create dump, copy it on the remote node then
> restore tasks from dump files. While all this dump-copy-restore thing goes all
> the process must be stopped. However, if we can monitor how tasks change their
> memory, we can dump and copy it in smaller chunks, periodically updating it 
> and thus freezing tasks only at the very end for the very short time to pick
> up the recent changes.
> 
> That said, some help from kernel to watch how processes modify the contents of
> their memory is required. I'd like to propose one possible solution of this
> task -- with the help of page-faults and trace events.
> 
> Briefly the approach is -- remap some memory regions as read-only, get the #pf
> on task's attempt to modify the memory and issue a trace event of that. Since
> we're only interested in parts of memory of some tasks, make it possible to mark
> the vmas we're interested in and issue events for them only. Also, to be aware
> of tasks unmapping the vma-s being watched, also issue an event when the marked
> vma is removed (and for symmetry -- an event when a vma is marked).
> 
> What do you think about this approach? Is this way of supporting mem snapshot
> OK for you, or should we invent some better one?

The patches look pretty simple.

Some performance numbers would be useful.

Is it reliable?  Under what circumstances will the trace system drop
events?

Please cc Steven Rostedt on tracing stuff - he is a diligent reviewer.

The proposed interface might be useful to things other than c/r.  But
it hasn't actually been described.  Please include a full description
of the proposed kernel/usersapce interface.

Two alternatives come to mind:

1)  Use /proc/pid/pagemap (Documentation/vm/pagemap.txt) in some
    fashion to determine which pages have been touched.

2)  At pagefault time, don't send an event: just mark the vma as
    "touched".  Then add a userspace interface to sweep the vma tree
    testing, clearing and reporting the touched flags.

2a) Avoid the full linear search by propagating the "touched" flag
    up the rbtree and do the sweep in a fashion similar to
    radix_tree_for_each_tagged().

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>