[PATCH] makedumpfile: request the kernel do page scans

ptesarik@xxxxxxx (Petr Tesarik) · Fri, 7 Dec 2012 17:50:55 +0100

V Mon, 19 Nov 2012 12:07:10 -0600
Cliff Wickman <cpw at sgi.com> naps?no:

> On Fri, Nov 16, 2012 at 03:39:44PM -0500, Vivek Goyal wrote:
> > On Thu, Nov 15, 2012 at 04:52:40PM -0600, Cliff Wickman wrote:
> > > 
> > > Gentlemen,
> > > 
> > > I know this is rather late to the game, given all the recent work
> > > to speed up makedumpfile and reduce the memory that it consumes.
> > > But I've been experimenting with asking the kernel to scan the
> > > page tables instead of reading all those page structures
> > > through /proc/vmcore.
> > > 
> > > The results are rather dramatic -- if they weren't I would not
> > > presume to suggest such a radical path.
> > > On a small, idle UV system: about 4 sec. versus about 40 sec.
> > > On a 8TB UV the unnecessary page scan alone takes 4 minutes, vs.
> > > about 200 min through /proc/vmcore.
> > > 
> > > I have not compared it to your version 1.5.1, so I don't know if
> > > your recent work provides similar speedups.
> > 
> > I guess try 1.5.1-rc. IIUC, we had the logic of going through page
> > tables but that required one single bitmap to be present and in
> > constrained memory environment we will not have that.
> > 
> > That's when this idea came up that scan portion of struct page
> > range, filter it, dump it and then move on to next range.
> > 
> > Even after 1.5.1-rc if difference is this dramatic, that means we
> > are not doing something right in makedumpfile and it needs to be
> > fixed/optimized.
> > 
> > But moving the logic to kernel does not make much sense to me at
> > this point of time untile and unless there is a good explanation
> > that why user space can't do a good job of what kernel is doing.
> 
> I tested a patch in which makedumpfile does nothing but scan all the
> page structures using /proc/vmcore.  It is simply reading each
> consecutive range of page structures in readmem() chunks of 512
> structures. And doing nothing more than accumulating a hash total of
> the 'flags' field in each page (for a sanity check). On my test
> machine there are 6 blocks of page structures, totaling 12 million
> structures. This takes 31.1 'units of time' (I won't say seconds, as
> the speed of the clock seems to be way too fast in the crash kernel).
> If I increase the buffer size to 5120 structures: 31.0 units. At
> 51200 structures: 30.9.  So buffer size has virtually no effect.
> 
> I also request the kernel to do the same thing.  Each of the 6
> requests asks the kernel to scan a range of page structures and
> accumulate a hash total of the 'flags' field.  (And also copy a
> 10000-element pfn list back to user space, to test that such copies
> don't add significant overhead.) And the 12 million pages are scanned
> in 1.6 'units of time'.
> 
> If I compare the time for actual page scanning (unnecessary pages and
> free pages) through /proc/vmcore vs.  requesting the kernel to do the
> scanning:  40 units vs. 3.8 units.
> 
> My conclusion is that makedumpfile's page scanning procedure is
> extremely dominated by the overhead of copying page structures
> through /proc/vmcore. And that is about 20x slower than using the
> kernel to access pages.

Understood.

I wonder if we can get the same speed if makedumpfile
mmaps /proc/vmcore instead of reading it. It's just a quick idea, so
maybe the first thing to find out is whether /proc/vmcore implements
mmap(), but if the bottleneck is indeed copy_to_user(), then this
should help.

Stay tuned,
Petr Tesarik