Re: [RFC PATCH 0/2] mm: Add ability to monitor task's memory changes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 04 Dec 2012 09:15:10 +0400
Pavel Emelyanov <xemul@xxxxxxxxxxxxx> wrote:

> 
> > Two alternatives come to mind:
> > 
> > 1)  Use /proc/pid/pagemap (Documentation/vm/pagemap.txt) in some
> >     fashion to determine which pages have been touched.
> 
> I thought about this. Unfortunately there's no free bits left in the pagemap
> entry. What can we do about it (other than introducing the pagemap2 file)?

urgh, we were pretty careless in laying out the /proc/pid/pagemap
entries.

Probably the 55 bits for pfn/swap were excessive.

The page shift didn't need six bits!  Simply predividing the page shift
by 1k would have saved a few bits, and permitting expansion to a 1^63
byte page size is nuts.

Sigh.  I wonder how traumatic it would be to put the pagemap record on
a diet and make up some free space.


Anyway, do you actually need to add another bit?  /proc/pid/pagemap
gives you the pfn which can then be used to look up the page's flags in
/proc/pageflags.  You can add a "touched" flag to /proc/kpageflags? 
But that would require grabbing another bit in struct page.flags, I
assume.

And it would be very expensive.  An in-kernel loop which searches the
MM spitting out a string of touched-pages would be faster, but still
slow.

hm.

> > 2)  At pagefault time, don't send an event: just mark the vma as
> >     "touched".  Then add a userspace interface to sweep the vma tree
> >     testing, clearing and reporting the touched flags.
> 
> Per-vma granularity is not enough. In OpenVZ we've observed Oracle touching
> several pages in a hundred-megs anon mapping. Marking _part_ of the vma with
> the "node write-faults" bit would help, but there's currently no APIs that
> modifies vma and report some info back at the same time. Can you propose how
> it could look like?

I don't see a need to report the info back at the same time?  You want
to *record* that information but only report it when someone does a
query?

Dunno.  One could add a radix-tree to the vma and store 32 or 64
per-page bits in each slots[] entry.  Worst case that would consume
approx one bit of kernel memory for each 4k of instantiated user pages
- an increase of 1/32768.  Not too bad.  Use the tagged-lookup facility
to efficiently query that bitmap at query-time.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]