On 12/04/2015 04:56 PM, Chao Fan wrote: > Hi Zhou Wenjian and Kumagai, > > I have follow Zhou Wenjian's words to do some tests, in the condition of > "-c", makdumpfile 1.5.9 does perform better than "-l". > > I have done more tests in a machine with 128G memory, in the condition > of "-d 0" and "-d 3", the makedumpfile 1.5.9 performs well. But if with > "--num-threads 1", it does need more time than without "--num-threads". > > Here is my results(makedumpfile -c): > > "-d 0" (the size of vmcore is 2.6G): > --num-threads time(seconds) > 0 556 > 1 1186 > 4 307 > 8 186 > 12 131 > 16 123 > > > "-d 3" (the size of vmcore is 1.3G): > --num-threads time(seconds) > 0 141 > 1 262 > 2 137 > 4 91 > 8 121 > 16 137 > Hello Chao, This result also seems not so good. We had test it, and you can refer to: http://lists.infradead.org/pipermail/kexec/2015-October/014576.html Could you collect the information by *perf stat -e page-faults* on both --num-threads 0 and --num-threads 1 ? Your result looks like the performance without the patch which dividing compress2(). -- Thanks Zhou > So, I think makedumpfile 1.5.9 can save time in the condition of "-c" > and not "-d 31" and not "--num-threads 1". > > ----- Original Message ----- >> From: "Wenjian Zhou/???" <zhouwj-fnst at cn.fujitsu.com> >> To: "Atsushi Kumagai" <ats-kumagai at wm.jp.nec.com> >> Cc: kexec at lists.infradead.org >> Sent: Friday, December 4, 2015 11:33:36 AM >> Subject: Re: [PATCH RFC 00/11] makedumpfile: parallel processing >> >> Hello Kumagai, >> >> On 12/04/2015 10:30 AM, Atsushi Kumagai wrote: >>> Hello, Zhou >>> >>>> On 12/02/2015 03:24 PM, Dave Young wrote: >>>>> Hi, >>>>> >>>>> On 12/02/15 at 01:29pm, "Zhou, Wenjian/???" wrote: >>>>>> I think there is no problem if other test results are as expected. >>>>>> >>>>>> --num-threads mainly reduces the time of compressing. >>>>>> So for lzo, it can't do much help at most of time. >>>>> >>>>> Seems the help of --num-threads does not say it exactly: >>>>> >>>>> [--num-threads THREADNUM]: >>>>> Using multiple threads to read and compress data of each page in >>>>> parallel. >>>>> And it will reduces time for saving DUMPFILE. >>>>> This feature only supports creating DUMPFILE in kdump-comressed >>>>> format from >>>>> VMCORE in kdump-compressed format or elf format. >>>>> >>>>> Lzo is also a compress method, it should be mentioned that --num-threads >>>>> only >>>>> supports zlib compressed vmcore. >>>>> >>>> >>>> Sorry, it seems that something I said is not so clear. >>>> lzo is also supported. Since lzo compresses data at a high speed, the >>>> improving of the performance is not so obvious at most of time. >>>> >>>>> Also worth to mention about the recommended -d value for this feature. >>>>> >>>> >>>> Yes, I think it's worth. I forgot it. >>> >>> I saw your patch, but I think I should confirm what is the problem first. >>> >>>> However, when "-d 31" is specified, it will be worse. >>>> Less than 50 buffers are used to cache the compressed page. >>>> And even the page has been filtered, it will also take a buffer. >>>> So if "-d 31" is specified, the filtered page will use a lot >>>> of buffers. Then the page which needs to be compressed can't >>>> be compressed parallel. >>> >>> Could you explain why compression will not be parallel in more detail ? >>> Actually the buffers are used also for filtered pages, it sounds >>> inefficient. >>> However, I don't understand why it prevents parallel compression. >>> >> >> Think about this, in a huge memory, most of the page will be filtered, and >> we have 5 buffers. >> >> page1 page2 page3 page4 page5 page6 page7 ..... >> [buffer1] [2] [3] [4] [5] >> unfiltered filtered filtered filtered filtered unfiltered filtered >> >> Since filtered page will take a buffer, when compressing page1, >> page6 can't be compressed at the same time. >> That why it will prevent parallel compression. >> >>> Further, according to Chao's benchmark, there is a big performance >>> degradation even if the number of thread is 1. (58s vs 240s) >>> The current implementation seems to have some problems, we should >>> solve them. >>> >> >> If "-d 31" is specified, on the one hand we can't save time by compressing >> parallel, on the other hand we will introduce some extra work by adding >> "--num-threads". So it is obvious that it will have a performance >> degradation. >> >> I'm not so sure if it is a problem that the performance degradation is so >> big. >> But I think if in other cases, it works as expected, this won't be a problem( >> or a problem needs to be fixed), for the performance degradation existing >> in theory. >> >> Or the current implementation should be replaced by a new arithmetic. >> For example: >> We can add an array to record whether the page is filtered or not. >> And only the unfiltered page will take the buffer. >> >> But I'm not sure if it is worth. >> For "-l -d 31" is fast enough, the new arithmetic also can't do much help. >> >> -- >> Thanks >> Zhou >> >> >> >> _______________________________________________ >> kexec mailing list >> kexec at lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/kexec >>