V Mon, 19 Nov 2012 12:07:10 -0600 Cliff Wickman <cpw at sgi.com> naps?no: > On Fri, Nov 16, 2012 at 03:39:44PM -0500, Vivek Goyal wrote: > > On Thu, Nov 15, 2012 at 04:52:40PM -0600, Cliff Wickman wrote: > > > > > > Gentlemen, > > > > > > I know this is rather late to the game, given all the recent work > > > to speed up makedumpfile and reduce the memory that it consumes. > > > But I've been experimenting with asking the kernel to scan the > > > page tables instead of reading all those page structures > > > through /proc/vmcore. > > > > > > The results are rather dramatic -- if they weren't I would not > > > presume to suggest such a radical path. > > > On a small, idle UV system: about 4 sec. versus about 40 sec. > > > On a 8TB UV the unnecessary page scan alone takes 4 minutes, vs. > > > about 200 min through /proc/vmcore. > > > > > > I have not compared it to your version 1.5.1, so I don't know if > > > your recent work provides similar speedups. > > > > I guess try 1.5.1-rc. IIUC, we had the logic of going through page > > tables but that required one single bitmap to be present and in > > constrained memory environment we will not have that. > > > > That's when this idea came up that scan portion of struct page > > range, filter it, dump it and then move on to next range. > > > > Even after 1.5.1-rc if difference is this dramatic, that means we > > are not doing something right in makedumpfile and it needs to be > > fixed/optimized. > > > > But moving the logic to kernel does not make much sense to me at > > this point of time untile and unless there is a good explanation > > that why user space can't do a good job of what kernel is doing. > > I tested a patch in which makedumpfile does nothing but scan all the > page structures using /proc/vmcore. It is simply reading each > consecutive range of page structures in readmem() chunks of 512 > structures. And doing nothing more than accumulating a hash total of > the 'flags' field in each page (for a sanity check). On my test > machine there are 6 blocks of page structures, totaling 12 million > structures. This takes 31.1 'units of time' (I won't say seconds, as > the speed of the clock seems to be way too fast in the crash kernel). > If I increase the buffer size to 5120 structures: 31.0 units. At > 51200 structures: 30.9. So buffer size has virtually no effect. > > I also request the kernel to do the same thing. Each of the 6 > requests asks the kernel to scan a range of page structures and > accumulate a hash total of the 'flags' field. (And also copy a > 10000-element pfn list back to user space, to test that such copies > don't add significant overhead.) And the 12 million pages are scanned > in 1.6 'units of time'. > > If I compare the time for actual page scanning (unnecessary pages and > free pages) through /proc/vmcore vs. requesting the kernel to do the > scanning: 40 units vs. 3.8 units. > > My conclusion is that makedumpfile's page scanning procedure is > extremely dominated by the overhead of copying page structures > through /proc/vmcore. And that is about 20x slower than using the > kernel to access pages. Understood. I wonder if we can get the same speed if makedumpfile mmaps /proc/vmcore instead of reading it. It's just a quick idea, so maybe the first thing to find out is whether /proc/vmcore implements mmap(), but if the bottleneck is indeed copy_to_user(), then this should help. Stay tuned, Petr Tesarik