On Wed, 2007-03-07 at 10:43 +0900, Ken'ichi Ohmichi wrote: > I want to use the feature of excluding zero-pages, because our systems > (x86_64) have many zero-pages immediately after system booting. > Bob is researching for the behavior of crash on ELF format dumpfiles. > I would like to wait for his report. Sorry you had to wait so long. Bob Montgomery Here's what I know about how crash deals with ELF (netdump) dump files compared to how it deals with kdump (diskdump) dump files. ======================================== Intro to ELF dumpfiles and zero filling: ======================================== ELF format dumpfiles do not contain a page-by-page bitmap of included and excluded pages. Instead, a program header table describes groups of contiguous pages that are present in the dumpfile. In its simplest form, this allows a debugger to locate groups of pages that are present in the file, and conversely to identify pages that are missing by failing to find a program header entry that encloses the address of a missing page. At some point between the 3.4 and 3.19 versions of crash, code was added to netdump.c:netdump_read to handle a zero_fill feature in the ELF files. In a program header entry where p_memsz (MemSiz) is bigger than p_filsiz (FileSiz), the zone between them is considered to be filled with zero upon access. For example, using info from readelf on a -E -d31 dumpfile: Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align ... LOAD 0x00000000004722c8 0xffff810005000000 0x0000000005000000 0x000000000063b000 0x0000000003000000 RWE 0 ... For the group of pages described by this program header entry, crash-3.19 sets up this internal representation: {file_offset = 0x4722c8, phys_start = 0x5000000, phys_end = 0x563b000, zero_fill = 0x8000000} If the FileSiz and MemSiz are the same, no zero fill zone is needed, phys_end is the real end of the segment, and zero_fill is set to 0x0. If the requested address falls between phys_start and phys_end, it is read from the computed file offset. Otherwise, if it falls between phys_end and zero_fill, the requested buffer is memset to zero. Here is an address that falls within the zero fill zone shown above. (0xffff81000563c000 maps to physical address 0x563c000, which is above phys_end, but below zero_fill). crash-3.19> x/xg 0xffff81000563c000 0xffff81000563c000: 0x0000000000000000 gdb (6.4.90-debian) can also read this address from the ELF dumpfile: (gdb) x/xg 0xffff81000563c000 0xffff81000563c000: 0x0000000000000000 But crash-3.4 did not have the zero fill code and fails: crash-3.4> x/xg 0xffff81000563c000 0xffff81000563c000: gdb: read error: kernel virtual address: ffff81000563c000 type: "gdb_readmem_callback" Cannot access memory at address 0xffff81000563c000 ========================== Philosophical Meandering ========================== The zero-fill feature gives an ELF dump file *three* ways to represent pages: 1) Not In The Address Space: There is no program header LOAD entry that contains the requested address. 2) Not In The File, Zero Fill: A program header LOAD entry contains the address, but the offset in the segment is between FileSiz and MemSiz. 3) In The File: A program header LOAD entry contains the address, and the offset in the segment is smaller than FileSiz. Unfortunately, makedumpfile recognizes *four* types of pages: A) Not In The Address Space: physical holes in memory, undumpable zones B) Excluded Type: Excluded based on page type (user memory, cached page) C) Zero Content: Excluded based on page content being all 0x0 D) In The File The problem at hand is that makedumpfile's current mapping of the four types of pages onto ELF's three types of representation puts both "B) Excluded Type" and "C) Zero Content" into ELF's "2) Not In The File, Zero Fill" representation. This results in crash reporting the contents of all addresses in excluded pages as 0x0, regardless of their original value. We have proposed to fix this problem on diskdump-format dump files by leaving zero pages in the bitmaps and page descriptors, but pointing all their data pointers to a single common page of zeroes for access. Coupled with a modification to crash's diskdump.c:read_diskdump routine to return SEEK_ERROR on excluded pages instead of zero-filling, we achieve the goal of reading zeroes only when zeroes were in the original address. We get an indication of read error when attempting to access a page that has really been excluded. But we still reduce the size of the dumpfile by storing one copy of a zero page to serve as the data image for all zero pages. ====================================================== The Compatibility Concern, and Why It Shouldn't Matter ====================================================== One concern is that if we fix diskdump-format dump files in this way, they will behave differently than ELF-format dump files. Actually, they will behave correctly, while ELF-format dump files will still have the "any excluded page must have contained zeroes" behavior. So I'd like to assert that even with the old zero-filling behavior, diskdump files and ELF files did not give the same results. In other words, you could not count on getting the same result for every address request, even with crash's zero-filling of excluded pages working for both cases. Here is why: The diskdump format includes a page level bitmap of every page in the known address space. If makedumpfile wants to exclude a page, it's as simple as changing a 1 to a 0 in the bitmap and leaving out the page. A huge expanse of memory with alternating pages excluded wouldn't cause anything more alarming than a bunch of 0xaaaaaaaa words in the bitmap. The ELF format requires a program header entry for each distinct group of pages, with a provision for representing one contiguous group of excluded pages at the end of the group for later zeroing. The overhead of creating a program header for every isolated excluded page would be prohibitive. Because of this, when makedumpfile builds its map of excluded pages and then translates that into the ELF format dumpfile, it only acts when it finds groups of 256 or more contiguous excluded pages. Then it sets up separate program header entries around the exclusion zone, and continues through the bitmap until it finds another large contiguous section of excluded pages. This means that pages that were meant to be excluded, but that were not in big contiguous groups, get to stay in the dumpfile. And here's an example, using crash-3.19 on two dumpfiles made from the same vmcore with an unmodified makedumpfile using -d31, one ELF and one normal (diskdump format). ============================================================= Elf (netdump) dumpfile (-E -d31): ============================================================= crash-3.19> sys KERNEL: vmlinux-2.6.18-3-telco-amd64 DUMPFILE: dumpfile.E_d31 CPUS: 2 DATE: Tue Feb 6 14:56:05 2007 UPTIME: 00:44:09 LOAD AVERAGE: 0.05, 0.03, 0.05 TASKS: 81 NODENAME: hpde RELEASE: 2.6.18-3-telco-amd64 VERSION: #1 SMP Mon Feb 5 13:33:27 MST 2007 MACHINE: x86_64 (1800 Mhz) MEMORY: 3.9 GB PANIC: "Oops: 0000 [1] SMP " (check log for details) crash-3.19> x/xg 0xffff810000005ff0 0xffff810000005ff0: 0x0000000000205007 crash-3.19> x/xg 0xffff8100cfe44800 0xffff8100cfe44800: 0xbb67bdc97f9fdefe ============================================================= kdump (diskdump) dumpfile (-d31): ============================================================= crash-3.19> sys KERNEL: ../vmlinux-2.6.18-3-telco-amd64 DUMPFILE: dumpfile.makedumpfile-d31.run1 CPUS: 2 DATE: Tue Feb 6 14:56:05 2007 UPTIME: 00:44:09 LOAD AVERAGE: 0.05, 0.03, 0.05 TASKS: 81 NODENAME: hpde RELEASE: 2.6.18-3-telco-amd64 VERSION: #1 SMP Mon Feb 5 13:33:27 MST 2007 MACHINE: x86_64 (1800 Mhz) MEMORY: 3.9 GB PANIC: "Oops: 0000 [1] SMP " (check log for details) crash-3.19> x/xg 0xffff810000005ff0 0xffff810000005ff0: 0x0000000000000000 crash-3.19> x/xg 0xffff8100cfe44800 0xffff8100cfe44800: 0x0000000000000000 The two memory accesses that return 0x0000000000000000 in the second case were not really zero, and the ELF dumpfile doesn't say they were. So we've never really had address for address compatibility between an ELF dump and a diskdump dump of the same situation anyway. With the proposed fixes for the diskdump format, these two accesses would have correctly given read errors, since the information is not in the dump file. And that's OK, because the system manager chose to exclude certain types of pages from the dump. The ELF user could feel lucky that they accidently got the right answer, only because the page in question was in a group of 10 contiguous excluded pages instead of 256 or more, and wasn't really excluded like it should have been. ===================================== Can ELF Dumpfiles Solve This Problem? ===================================== To achieve correctness with ELF dumpfiles, one could perhaps remap the four types of pages to the three types of ELF representations so that "A) Not In The Address Space" and "B) Excluded Type" were both mapped to "1) Not In The Address Space". Then "C) Zero Content" would map to "2) Not In The File, Zero Fill". You would lose the ability to know if a page were missing because it was never in the address space in the first place, or because it was excluded because of its type. But if you read a zero, you'd know it really was a zero. This method probably means that makedumpfile would have to process a bitmap of excluded (but not zero) pages against the original program header table to create a new set of program headers that reflected the excluded pages, and then go back and process that against a bitmap of zero pages, to see if any zero-fill tails could be created. That is left as an exercise for the loyal ELF user. Otherwise, makedumpfile could quit removing pages because they're zero, and try to achieve dumpfile size goals by removing pages based only on type. Then zero pages that were supposed to stay would still be there, and crash wouldn't need to fill missing pages with zeroes without knowing whether they really contained zeroes in the first place. ============== The Last Point ============== The ELF format is inferior for most serious dumpfile debugging. It does not support page compression, and its format does not allow the exclusion of all eligible pages. Whether or not the inferior ELF format dumpfiles can be made completely correct should not serve as a barrier to getting it right on diskdump format dumpfiles. As demonstrated, the two types aren't truly bug for bug compatible now, so fixing one without fixing the other is still a net win. -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility