32TB kdump

d.hatayama@xxxxxxxxxxxxxx (HATAYAMA Daisuke) · Mon, 01 Jul 2013 09:55:53 +0900

(2013/06/28 6:17), Vivek Goyal wrote:
> On Fri, Jun 21, 2013 at 09:17:14AM -0500, Cliff Wickman wrote:

>
> Try using snappy or lzo for faster compression.
>
>>    So a good workaround for a very large system might be to dump uncompressed
>>    to an SSD.
>
> Interesting.
>
>>    The multi-threading of the crash kernel would produce a big gain.
>
> Hatayama once was working on patches to bring up multiple cpus in second
> kernel. Not sure what happened to those patches.
>
>> - Use of mmap on /proc/vmcore increased page scanning speed from 4.4 minutes
>>    to 3 minutes.  It also increased data copying speed (unexpectedly) from
>>    38min. to 35min.
>
> Hmm.., so on large memory systems, mmap() will not help a lot? In those
> systems dump times are dominidated by disk speed and compression time.
>
> So far I was thinking that ioremap() per page was big issue and you
> also once had done the analysis that passing page list to kernel made
> things significantly faster.
>
> So on 32TB machines if it is taking 2hrs to save dump and mmap() shortens
> it by only few minutes, it really is not significant win.
>

Sorry, I've explained this earlier in this ML.

Some patches have been applied on makedumpfile to improve the filtering speed.
Two changes that were useful for the improvement are the one implementing
a 8-slot cache for physical page for the purpose of reducing the number of
/proc/vmcore access for paging (just as TLB), and the one that cleanups
makedumpfile's filtering path.

Performance degradation by ioremap() is now being hided on a single cpu, but
it would again occur on multiple cpus. Sorry, but I have yet to do benchmark
showing the fact cleanly as numeral values.

-- 
Thanks.
HATAYAMA, Daisuke