(2013/06/28 6:17), Vivek Goyal wrote: > On Fri, Jun 21, 2013 at 09:17:14AM -0500, Cliff Wickman wrote: > > Try using snappy or lzo for faster compression. > >> So a good workaround for a very large system might be to dump uncompressed >> to an SSD. > > Interesting. > >> The multi-threading of the crash kernel would produce a big gain. > > Hatayama once was working on patches to bring up multiple cpus in second > kernel. Not sure what happened to those patches. > >> - Use of mmap on /proc/vmcore increased page scanning speed from 4.4 minutes >> to 3 minutes. It also increased data copying speed (unexpectedly) from >> 38min. to 35min. > > Hmm.., so on large memory systems, mmap() will not help a lot? In those > systems dump times are dominidated by disk speed and compression time. > > So far I was thinking that ioremap() per page was big issue and you > also once had done the analysis that passing page list to kernel made > things significantly faster. > > So on 32TB machines if it is taking 2hrs to save dump and mmap() shortens > it by only few minutes, it really is not significant win. > Sorry, I've explained this earlier in this ML. Some patches have been applied on makedumpfile to improve the filtering speed. Two changes that were useful for the improvement are the one implementing a 8-slot cache for physical page for the purpose of reducing the number of /proc/vmcore access for paging (just as TLB), and the one that cleanups makedumpfile's filtering path. Performance degradation by ioremap() is now being hided on a single cpu, but it would again occur on multiple cpus. Sorry, but I have yet to do benchmark showing the fact cleanly as numeral values. -- Thanks. HATAYAMA, Daisuke