Atsushi, Please see response below! eric On 07/11/2017 02:43 AM, Atsushi Kumagai wrote: > Hello Eric, > >>> On 07/07/2017 04:09 AM, Atsushi Kumagai wrote: >>>>> The PFN_EXCLUDED value is used to control at which point a run of >>>>> zeros in the bitmap (zeros denote excluded pages) is large enough >>>>> to warrant truncating the current output segment and to create a >>>>> new output segment (containing non-excluded pages), in an ELF dump. >>>>> >>>>> If the run is smaller than PFN_EXCLUDED, then those excluded pages >>>>> are still output in the ELF dump, for the current output segment. >>>>> >>>>> By using smaller values of PFN_EXCLUDED, the resulting dump file >>>>> size can be made smaller by actually removing more excluded pages >>>>> from the resulting dump file. >>>>> >>>>> This patch adds the command line option --exclude-threshold=<value> >>>>> to indicate the threshold. The default is 256, the legacy value >>>>> of PFN_EXCLUDED. The smallest value permitted is 1. >>>>> >>>>> Using an existing vmcore, this was tested by the following: >>>>> >>>>> % makedumpfile -E -d31 --exclude-threshold=256 -x vmlinux vmcore >>>>> newvmcore256 >>>>> % makedumpfile -E -d31 --exclude-threshold=4 -x vmlinux vmcore >>>>> newvmcore4 >>>>> >>>>> I utilize -d31 in order to exclude as many page types as possible, >>>>> resulting in a [significantly] smaller file sizes than the original >>>>> vmcore. >>>>> >>>>> -rwxrwx--- 1 edevolde edevolde 4034564096 Jun 27 10:24 vmcore >>>>> -rw------- 1 edevolde edevolde 119808156 Jul 6 13:01 newvmcore256 >>>>> -rw------- 1 edevolde edevolde 100811276 Jul 6 13:08 newvmcore4 >>>>> >>>>> The use of smaller value of PFN_EXCLUDED increases the number of >>>>> output segments (the 'Number of program headers' in the readelf >>>>> output) in the ELF dump file. >>>> >>>> How will you tune the value ? I'm not sure what is the benefit of the >>>> tunable PFN_EXCLUDED. If there is no regression caused by too many >>>> PT_LOAD >>>> entries, I think we can decide a concrete PFN_EXCLUDED. >>> >>> Allow me note two things prior to addressing the question. >>> >>> Note that the value for PFN_EXCLUDED really should be in the range: >>> >>> 1 <= PFN_EXCLUDED <= NUM_PAGES(largest segment) >>> >>> but that values larger than NUM_PAGES(largest segment) behave the same >>> as NUM_PAGES(largest segment) and simply prevent makedumpfile from ever >>> omitting excluded pages from the dump file. >>> >>> Also note that the ELF header allows for a 16-bit e_phnum value for the >>> number of segments in the dump file. As it stands today, I doubt that >>> anybody has come close to reaching 65535 segments, but the combination >>> of larger and larger memories as well as the work we (Oracle) are doing >>> to further enhance the capabilities of makedumpfile, I believe we will >>> start to challenge this 65535 number. > > I overlooked the limitation of the number of segments, so I considered > only "The first benefit" you said below. > >>> The ability to tune PFN_EXCLUDED allows one to minimize file size while >>> still staying within ELF boundaries. >>> >>> There are two ways in which have PFN_EXCLUDED as a tunable parameter >>> benefits the user. >>> >>> The first benefit is, when making PFN_EXCLUDED smaller, makedumpfile has >>> more opportunities to NOT write excluded pages to the resulting dump >>> file, thus obtaining a smaller overall dump file size. And since a >>> PT_LOAD header is smaller than a page, this penalty (of more segments) >>> will always result in a smaller file size. (In the example I cite the >>> dump file was 18MB smaller with a PFN_EXCLUDED value of 4 than default >>> 256, in spite of increasing the number of segments from 6 to 244). >>> >>> The second benefit is, when enabling PFN_EXCLUDED to become larger, it >>> allows makedumpfile to continue to generate valid ELF dump files in the >>> presence of larger and larger memory systems. Generally speaking, the >>> goal is to minimize the size of dump files via the exclusion of >>> uninteresting pages (ie zero, free, etc), especially as the size of >>> memory continues to grow and grow. As the memory increases, there are >>> more and more of these uninteresting pages, and more opportunities for >>> makedumpfile to omit them (even at the current PFN_EXCLUDED value of >>> 256). Furthermore, we are working on additional page exclusion >>> strategies that will drive more and more opportunities for makedumpfile >>> to omit these pages from the dump file. And as makedumpfile omits more >>> and more pages from the dump file, that increases the number of segments >>> needed. >>> >>> By enabling a user to tune the value of PFN_EXCLUDED, we provide an >>> additional mechanism to balance the size of the ELF dump file with >>> respect to the size of memory. >> >> It occurred to me that offering the option "--exclude-threshold=auto" >> whereby a binary search on the second bitmap in the function >> get_loads_dumpfile_cyclic() to determine the optimum value of >> PFN_EXCLUDED (optimum here meaning the smallest possible value while >> still staying within 65535 segments, which would yield the smallest >> possible dump file size for the given constraints) would be an excellent >> feature to have? > > I think the "auto" is necessary for --exclude-threshold, the optimum > value should be calculated automatically. Otherwise, it imposes trial-and-error > on users every time, it doesn't sound practical. IOW, this patch is > unacceptable if there is no mechanism to support users. > So now, my only concern for this option is the processing time of the > binary search. OK, so the idea of "tuning" the value of PFN_EXCLUDED is agree-able, great! I will work on the binary search and report back with measurements on the processing time of 'crash'. From there we can determine if benefit is worthwhile. Regards, eric > > [snip] >>>>> And with a larger number of segments, loading both vmcore and newvmcore4 >>>>> into 'crash' resulted in identical outputs when run with the dmesg, ps, >>>>> files, mount, and net sub-commands. >>>> >>>> What about the processing speed of crash, is there no slow down ? >>> >>> I did not observe a noticeable change in processing speed of crash. > > Good, it would be better to be represented by actual measured results. > > Thanks, > Atsushi Kumagai