[PATCH 0/13] makedumpfile: Avoid two pass filtering by using bitmap file.

ats-kumagai@xxxxxxxxxxxxx (Atsushi Kumagai) · Fri, 15 May 2015 04:51:19 +0000

>>>How about compromising progress information to some extent? The first
>>>pass is intended to count up the exact number of dumpable pages just
>>>to provide precise progress information. Is such prcision really
>>>needed?
>>
>> The first pass counts up the num_dumpable *to calculate the offset of
>> starting page data region in advance*, otherwise makedumpfile can't start
>> to write page data except create a sparse file.
>>
>>    7330 write_kdump_pages_and_bitmap_cyclic(struct cache_data *cd_header, struct cache_data *cd_page)
>>    7331 {
>>    7332         struct page_desc pd_zero;
>>    7333         off_t offset_data=0;
>>    7334         struct disk_dump_header *dh = info->dump_header;
>>    7335         unsigned char buf[info->page_size];
>>    7336         struct timeval tv_start;
>>    7337
>>    7338         /*
>>    7339          * Reset counter for debug message.
>>    7340          */
>>    7341         pfn_zero = pfn_cache = pfn_cache_private = 0;
>>    7342         pfn_user = pfn_free = pfn_hwpoison = 0;
>>    7343         pfn_memhole = info->max_mapnr;
>>    7344
>>    7345         cd_header->offset
>>    7346                 = (DISKDUMP_HEADER_BLOCKS + dh->sub_hdr_size + dh->bitmap_blocks)
>>    7347                 * dh->block_size;
>>    7348         cd_page->offset = cd_header->offset + sizeof(page_desc_t)*info->num_dumpable;
>>    7349         offset_data = cd_page->offset;                                  ^^^^^^^^^^^^
>>
>>
>
>I overlooked this, sorry.
>
>Size of page description header is 24 bytes. This corresponds to 6 GB
>per 1 TB. Can this become a big problem? Of course, I think it odd
>that page description table could be larger than memory data part.

At least, it looks that the member "page_flags" can be removed since
makedumpfile always just set 0 to it and crash doesn't refer it.

  typedef struct page_desc {
	off_t offset; 				/* the offset of the page data*/
	unsigned int size; 			/* the size of this dump page */
	unsigned int flags; 			/* flags */
	unsigned long long page_flags; 		/* page flags */  <--- always 0, this 8 byte is useless.
  } page_desc_t;

(Sorry for getting off track here)
Further, I have another idea that would reduce the total size
of page descriptor. That is assigning a page descriptor to a number
of pages, it means multiple pages will be managed as a data block.

The original purpose of the idea is improving compressive performance
by compressing some pages in a lump.
We know the compression with zlib is too slow. I suspect that one of
the causes is the buffer size for compress2().
When compressing a 100MB file, I expect that compressing 10MB block 10 times
will be faster than compressing 1MB block 100 times.
Actually I did simple verification with the attached program like:

  # ./zlib_compress 1024 testdata
  TOTAL COMPRESSION TIME: 18.478064
  # ./zlib_compress 10240 testdata
  TOTAL COMPRESSION TIME: 5.940524
  # ./zlib_compress 102400 testdata
  TOTAL COMPRESSION TIME: 2.088867
  #

Unfortunately I haven't had a chance to work for it for a long time,
but I think it would be better to consider it together if we design
a new dump format.

>There's another aproach: construct the page description table at each
>cycle separately over a dump file and connect them by a linked list.
>
>This changes dump format and needs to add crash utility support; no
>compatibility to current crash utility.

It's interesting. I think we should improve the format if there is a
good reason, the format shouldn't be an obstacle.
Of course, the new format should be an option at first, but it would
be great if there is a choice to get better performance.

Thanks
Atsushi Kumagai

>>>For example, how about another simple progress information:
>>>
>>>   pfn / max_mapnr
>>>
>>>where pfn is the number of a page frame that is currently
>>>processed. We know max_mapnr from the beginning, so this is possible
>>>within one pass. It's less precise but might be precise enough.
>>
>> I also think it's enough for progress information, but anyway the 1st
>> pass is necessary as above.
>>
>>
>> Thanks
>> Atsushi Kumagai
>--
>Thanks.
>HATAYAMA, Daisuke
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: zlib_compress.c
URL: <http://lists.infradead.org/pipermail/kexec/attachments/20150515/a46cc303/attachment.c>