I have been testing recent kernel and kexec-tools for doing kdump of large memories, and found good results. -------------------------------- UV2000 memory: 32TB crashkernel=2G at 4G command line /usr/bin/makedumpfile --non-cylic -c --message-level 23 -d 31 \ --map-size 4096 -x /boot/vmlinux-3.10.0-rc5-linus-cpw+ /proc/vmcore \ /tmp/cpw/dumpfile page scanning 570 sec. copying data 5795 sec. (72G) (The data copy ran out of disk space at 23%, so the time and size above are extrapolated.) -------------------------------- UV1000 memory: 8.85TB crashkernel=1G at 5G command line /usr/bin/makedumpfile --non-cylic -c --message-level 23 -d 31 \ --map-size 4096 -x /boot/vmlinux-3.9.6-cpw-medusa /proc/vmcore \ /tmp/cpw/dumpfile page scanning 175 sec. copying data 2085 sec. (15G) (The data copy ran out of disk space at 60%, so the time and size above are extrapolated.) Notes/observations: - These systems were idle, so this is the capture of basically system memory only. - Both stable 3.9.6 and 3.10.0-rc5 worked. - Use of crashkernel=1G,high was usually problematic. I assume some problem with a conflict with something else using high memory. I always use the form like 1G at 5G, finding memory by examining /proc/iomem. - Time for copying data is dominated by data compression. Writing 15G of compressed data to /dev/null takes about 35min. Writing the same data but uncompressed (140G) to /dev/null takes about 6min. So a good workaround for a very large system might be to dump uncompressed to an SSD. The multi-threading of the crash kernel would produce a big gain. - Use of mmap on /proc/vmcore increased page scanning speed from 4.4 minutes to 3 minutes. It also increased data copying speed (unexpectedly) from 38min. to 35min. So I think it is worthwhile to push Hatayama's 9-patch set into the kernel. - I applied a 5-patch set from Takao Indoh to fix reset_devices handling of PCI devices. And I applied 3 kernel hacks of my own: - making a "Crash kernel low" section in /proc/iomem - make crashkernel avoid some things in pci_swiotlb_detect_override(), pci_swiotlb_detect_4gb() and register_mem_sect_under_node() - doing a crashkernel return from cpu_up() I don't understand why these should be necessary for my kernels but are not reported as problems elsewhere. I'm still investigating and will discuss those patches separately. - my makedumpfile is an mmap-using version, with about 10 patches applied. I'll check which of those are not in the common version and discuss separately. - my kexec is version 2.0.4 with 3 patches applied. I'll check which of those are not in the common version and discuss separately. -Cliff