On Fri, Apr 05, 2013 at 12:04:02AM +0000, HATAYAMA Daisuke wrote: > Currently, read to /proc/vmcore is done by read_oldmem() that uses > ioremap/iounmap per a single page. For example, if memory is 1GB, > ioremap/iounmap is called (1GB / 4KB)-times, that is, 262144 > times. This causes big performance degradation. > > In particular, the current main user of this mmap() is makedumpfile, > which not only reads memory from /proc/vmcore but also does other > processing like filtering, compression and IO work. > > To address the issue, this patch implements mmap() on /proc/vmcore to > improve read performance. > > Benchmark > ========= > > You can see two benchmarks on terabyte memory system. Both show about > 40 seconds on 2TB system. This is almost equal to performance by > experimtanal kernel-side memory filtering. > > - makedumpfile mmap() benchmark, by Jingbai Ma > https://lkml.org/lkml/2013/3/27/19 > > - makedumpfile: benchmark on mmap() with /proc/vmcore on 2TB memory system > https://lkml.org/lkml/2013/3/26/914 > > ChangeLog > ========= > > v3 => v4) > > - Rebase 3.9-rc7. > - Drop clean-up patches orthogonal to the main topic of this patch set. > - Copy ELF note segments in the 1st kernel just as in v1. Allocate > vmcore objects per pages. => See [PATCH 5/8] > - Map memory referenced by PT_LOAD entry directly even if the start or > end of the region doesn't fit inside page boundary, no longer copy > them as the previous v3. Then, holes, outside OS memory, are visible > from /proc/vmcore. => See [PATCH 7/8] > > v2 => v3) > > - Rebase 3.9-rc3. > - Copy program headers seprately from e_phoff in ELF note segment > buffer. Now there's no risk to allocate huge memory if program > header table positions after memory segment. > - Add cleanup patch that removes unnecessary variable. > - Fix wrongly using the variable that is buffer size configurable at > runtime. Instead, use the varibale that has original buffer size. > > v1 => v2) > > - Clean up the existing codes: use e_phoff, and remove the assumption > on PT_NOTE entries. > - Fix potencial bug that ELF haeader size is not included in exported > vmcoreinfo size. > - Divide patch modifying read_vmcore() into two: clean-up and primary > code change. > - Put ELF note segments in page-size boundary on the 1st kernel > instead of copying them into the buffer on the 2nd kernel. > > Test > ==== > > This patch set is composed based on v3.9-rc7. > > Done on x86-64, x86-32 both with 1GB and over 4GB memory environments. > > --- > > HATAYAMA Daisuke (8): > vmcore: support mmap() on /proc/vmcore > vmcore: treat memory chunks referenced by PT_LOAD program header entries in \ > page-size boundary in vmcore_list > vmcore: count holes generated by round-up operation for page boudary for size \ > of /proc/vmcore > vmcore: copy ELF note segments in the 2nd kernel per page vmcore objects > vmcore: Add helper function vmcore_add() > vmcore, procfs: introduce MEM_TYPE_CURRENT_KERNEL flag to distinguish objects \ > copied in 2nd kernel vmcore: clean up read_vmcore() > vmcore: allocate buffer for ELF headers on page-size alignment > > > fs/proc/vmcore.c | 349 ++++++++++++++++++++++++++++++++--------------- > include/linux/proc_fs.h | 8 + > 2 files changed, 245 insertions(+), 112 deletions(-) > > -- > > Thanks. > HATAYAMA, Daisuke This is a very important patch set for speeding the kdump process. (patches 1 - 8) We have found the mmap interface to /proc/vmcore about 80x faster than the read interface. That is, doing mmap's and copying data (in pieces the size of page structures) transfers all of /proc/vmcore about 80 times faster than reading it. This greatly speeds up the capture of a kdump, as the scan of page structures takes the bulk of the time in dumping the OS on a machine with terabytes of memory. We would very much like to see this set make it into the 3.10 release. Acked-by: Cliff Wickman <cpw at sgi.com> -Cliff -- Cliff Wickman SGI cpw at sgi.com (651) 683-3824