On Thu, 2006-11-09 at 08:53 -0500, Dave Anderson wrote: > > What you could do is set up makedumpfile to restrict > pages to its maximum capability, and instead of returning > a zero-filled page in read_diskdump(), insert an > "error(FATAL,...)" call there. It would be interesting > to note whether (1) the system comes up, and (2) whether > it effects commands that otherwise would work. Back on 8 November, 2006, I raised the concern about makedumpfile's ability to omit pages containing zeros from the dumpfile, and the fact that the crash debugger compensates by returning 0's for any access to a page that has been omitted from the dumpfile. The undesirable result of this feature is that a request for data from a page that was omitted for one of the other reasons (such as being owned by a user process, or being in the page cache) will also return a 0, with no warning that the page was actually not in the dump, and that crash doesn't really know the contents of that memory location. Takao Indoh recalled the policy that was implemented as being: 1) If the page is not found because of invalid address (e.g. address points memory hole), read_diskdump should return error. (This corresponds to pages not found in the first diskdump bitmap.) 2) If the page is not found because of partial dump, read_diskdump should return zero-filled page. (This corresponds to pages not found in the second bitmap and is what I'd like to change.) Dave suggested an experiment to change crash so that it did not return a zero-filled page for pages omitted from a partial dump and see what happened. And now we rejoin the present time... My first test was to change crash to return a SEEK_ERROR from diskdump.c:read_diskdump instead of setting up the zero buffer when the page failed the page_is_dumpable() test. On my test dump with zero-pages excluded, crash failed to come up when it attempted to read the log buffer, since kernel.c:dump_log uses a FAULT_ON_ERROR for its readmem call. Note that my dump was made before the system's log_buffer filled and wrapped around, so it still had some page-sized expanses of zeros left in it that were excluded by makedumpfile. A dump from a system that had been up longer might have survived this step. When I changed crash to survive dump_log by changing that readmem call from FAULT_ON_ERROR to RETURN_ON_ERROR|QUIET, I then found that the mod command did not work because of a non-fatal "cannot access vmalloc'd module memory" that kept commands having to do with modules and their symbols from working correctly. So we know that simply changing crash to return errors on accesses to missing pages will not be satisfactory as long as zero-filled pages can be omitted from a dumpfile. I first propose to change makedumpfile so that it does not exclude zero pages. I'll guess that the feature is there because it was the original diskdump dumpfile reduction strategy, and it remains a fairly efficient way to reduce the size of certain dumps. I tried to think of a way to take the zero-pages out of the dumpfile but leave a marker that says it really is a zero-page, and not a page excluded by a "type of page" test. But I didn't want to re-invent the diskdump format. So to keep most of the advantage of zero-page exclusion (quick detection, easy reduction logic) without introducing a new diskdump format, I propose to continue to check for zero-pages, but instead of removing them from the bitmap, replace each zero-page with a pre-computed maximally compressed page of zeroes, and set the compressed flag for that page. A maximally compressed 4096-byte page of zeroes comes to 26 bytes. That consumes more space than leaving it out of the file completely, but is still small enough to see the advantage of testing for pages full of zeros. Even if compression is enabled for the whole dump, it is quicker to test for all zeroes first and copy in the pre-computed zero page, and also saves a little more space, since the overall compression option uses BEST_SPEED and reduces zero pages to 41 bytes. The attached patch is for makedumpfile-1.1.1. It leaves the dumplevel option of excluding pages, but implements the exclusion by substituting the compressed zero page instead of omitting the page from the bitmap. Since the trick involves using the compression flag, it is not implemented in the elf file format output option. If this change is made to makedumpfile, there will no longer be dumpfiles that have important pages missing just because they contained all zeros. Then we can approach the issue of how to make crash warn about requests to missing pages, while still maintaining compatibility with old existing diskdump files. Thanks, Bob Montgomery
diff -urp makedumpfile-1.1.1/makedumpfile.c makedumpfile-1.1.1-bobm/makedumpfile.c --- makedumpfile-1.1.1/makedumpfile.c 2007-02-08 02:21:51.000000000 -0700 +++ makedumpfile-1.1.1-bobm/makedumpfile.c 2007-02-22 16:36:35.000000000 -0700 @@ -3845,8 +3845,8 @@ write_kdump_pages(struct DumpInfo *info) struct page_desc pd; off_t offset_data = 0, offset_memory = 0; struct disk_dump_header *dh = info->dump_header; - unsigned char *buf = NULL, *buf_out = NULL; - unsigned long len_buf_out; + unsigned char *buf = NULL, *buf_out = NULL, *buf_zero_comp = NULL; + unsigned long len_buf_out, len_zero_comp; struct cache_data bm2, pdesc, pdata; struct dump_bitmap bitmap1, bitmap2; const off_t failed = (off_t)-1; @@ -3898,6 +3898,26 @@ write_kdump_pages(struct DumpInfo *info) strerror(errno)); goto out; } + + if (info->dump_level & DL_EXCLUDE_ZERO) { + /* set up a compressed zero page by compressing a page + * of zeros. */ + memset(buf, 0, info->page_size); + len_zero_comp = len_buf_out; + if (compress2(buf_out, &len_zero_comp, buf, + info->page_size, Z_BEST_COMPRESSION) != Z_OK) { + /* unexpected */ + ERRMSG("Can't set up a compressed zero page.\n"); + goto out; + } + if ((buf_zero_comp = malloc(len_zero_comp)) == NULL) { + ERRMSG("Can't allocate memory for the compressed " + "zero page. %s\n", strerror(errno)); + goto out; + } + memcpy(buf_zero_comp, buf_out, len_zero_comp); + } + if ((bm2.buf = calloc(1, BUFSIZE_BITMAP)) == NULL) { ERRMSG("Can't allocate memory for 2nd-bitmap buffer. %s\n", strerror(errno)); @@ -4000,21 +4020,17 @@ write_kdump_pages(struct DumpInfo *info) } /* - * Exclude the page filled with zeros. + * Check for opportunities to compress pages. */ + size_out = len_buf_out; if ((info->dump_level & DL_EXCLUDE_ZERO) && is_zero_page(buf, info->page_size)) { - set_bitmap(bm2.buf, pfn%PFN_BUFBITMAP, 0); - flag_change_bitmap = 1; - continue; - } - /* - * Compress the page data. - */ - size_out = len_buf_out; - if (info->flag_compress + pd.flags = 1; + pd.size = len_zero_comp; + memcpy(buf, buf_zero_comp, pd.size); + } else if (info->flag_compress && (compress2(buf_out, &size_out, buf, - info->page_size, Z_BEST_SPEED) == Z_OK) + info->page_size, Z_BEST_SPEED) == Z_OK) && (size_out < info->page_size)) { pd.flags = 1; pd.size = size_out;
-- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility