From: Atsushi Kumagai <kumagai-atsushi@xxxxxxxxxxxxxxxxx> Subject: RE: [PATCH v2 0/5] makedumpfile: --split: assign fair I/O workloads in appropriate time Date: Mon, 27 Oct 2014 07:51:56 +0000 > Hello Zhou, > >>On 10/17/2014 11:50 AM, Atsushi Kumagai wrote: >>> Hello, >>> >>> The code looks good to me, thanks Zhou. >>> Now, I have a question on performance. >>> >>>> The issue is discussed at http://lists.infradead.org/pipermail/kexec/2014-March/011289.html >>>> >>>> This patch implements the idea of 2-pass algorhythm with smaller memory to manage splitblock table. >>>> Exactly the algorhythm is still 3-pass,but the time of second pass is much shorter. >>>> The tables below show the performence with different size of cyclic-buffer and splitblock. >>>> The test is executed on the machine having 128G memory. >>>> >>>> the value is total time (including first pass and second pass). >>>> the value in brackets is the time of second pass. >>> >>> Do you have any idea why the time of second pass is much larger when >>> the splitblock-size is 2G ? I worry about the scalability. >>> >>Hello, >> >> Since the previous machine can't be used for some reasons,I test several times using latest code >>in others, but that never happened. It seems that all things are right. Tests are executed in two machines(server,pc). >>Tests are based on: > > Well...OK, I'll take that as an issue specific to that machine > (or your mistakes as you said). > Now I have another question. > > calculate_end_pfn_by_splitblock(): > ... > /* deal with incomplete splitblock */ > if (pfn_needed_by_per_dumpfile < 0) { > --*current_splitblock; > splitblock_inner -= splitblock->entry_size; > end_pfn = CURRENT_SPLITBLOCK_PFN_NUM; > *current_splitblock_pfns = (-1) * pfn_needed_by_per_dumpfile; > pfn_needed_by_per_dumpfile += read_value_from_splitblock_table(splitblock_inner); > end_pfn = calculate_end_pfn_in_cycle(CURRENT_SPLITBLOCK_PFN_NUM, > CURRENT_SPLITBLOCK_PFN_NUM + splitblock->page_per_splitblock, > end_pfn,pfn_needed_by_per_dumpfile); > } > > This block causes the re-scanning for the cycle corresponding to the > current_splitblock, so the larger cyc-buf, the longer the time takes. > If cyc-buf is 4096 (this means the number of cycle is 1), the whole page > scanning will be done in the second pass. Actually, the performance when > cyc-buf=4096 was so bad. > > Is this process necessary ? I think splitting splitblocks is overkill > because I understood that splblk-size is the granularity of the > fairness I/O, tuning splblk-size is a trade off between fairness and > memory usage. > However, there is no advantage to reducing splblk-size in the current > implementation, it just consumes large amounts of memory. > If we remove the process, we can avoid the whole page scanning in > the second pass and reducing splblk-size will be meaningful as I > expected. > Yes, I don't think this rescan works with this splitblock method, too. The idea of this splitblock method is to reduce the number of filitering processing from 3-times to 2-times at the expence of at most splitblock-size difference of each dump file. Doing rescan here doesn't fit to the idea. -- Thanks. HATAYAMA, Daisuke