[PATCH RFC 00/11] makedumpfile: parallel processing

cfan@xxxxxxxxxx (Chao Fan) · Fri, 4 Dec 2015 03:56:41 -0500 (EST)

Hi Zhou Wenjian and Kumagai,

I have follow Zhou Wenjian's words to do some tests, in the condition of
"-c", makdumpfile 1.5.9 does perform better than "-l".

I have done more tests in a machine with 128G memory, in the condition
of "-d 0" and "-d 3", the makedumpfile 1.5.9 performs well. But if with
"--num-threads 1", it does need more time than without "--num-threads".

Here is my results(makedumpfile -c):

"-d 0" (the size of vmcore is 2.6G):
--num-threads        time(seconds)
    0                 556
    1                1186
    4                 307
    8                 186
   12                 131
   16                 123

"-d 3" (the size of vmcore is 1.3G):
--num-threads        time(seconds)
    0                 141
    1                 262
    2                 137
    4                  91
    8                 121
   16                 137

So, I think makedumpfile 1.5.9 can save time in the condition of "-c"
and not "-d 31" and not "--num-threads 1".

----- Original Message -----
> From: "Wenjian Zhou/???" <zhouwj-fnst at cn.fujitsu.com>
> To: "Atsushi Kumagai" <ats-kumagai at wm.jp.nec.com>
> Cc: kexec at lists.infradead.org
> Sent: Friday, December 4, 2015 11:33:36 AM
> Subject: Re: [PATCH RFC 00/11] makedumpfile: parallel processing
> 
> Hello Kumagai,
> 
> On 12/04/2015 10:30 AM, Atsushi Kumagai wrote:
> > Hello, Zhou
> >
> >> On 12/02/2015 03:24 PM, Dave Young wrote:
> >>> Hi,
> >>>
> >>> On 12/02/15 at 01:29pm, "Zhou, Wenjian/???" wrote:
> >>>> I think there is no problem if other test results are as expected.
> >>>>
> >>>> --num-threads mainly reduces the time of compressing.
> >>>> So for lzo, it can't do much help at most of time.
> >>>
> >>> Seems the help of --num-threads does not say it exactly:
> >>>
> >>>     [--num-threads THREADNUM]:
> >>>         Using multiple threads to read and compress data of each page in
> >>>         parallel.
> >>>         And it will reduces time for saving DUMPFILE.
> >>>         This feature only supports creating DUMPFILE in kdump-comressed
> >>>         format from
> >>>         VMCORE in kdump-compressed format or elf format.
> >>>
> >>> Lzo is also a compress method, it should be mentioned that --num-threads
> >>> only
> >>> supports zlib compressed vmcore.
> >>>
> >>
> >> Sorry, it seems that something I said is not so clear.
> >> lzo is also supported. Since lzo compresses data at a high speed, the
> >> improving of the performance is not so obvious at most of time.
> >>
> >>> Also worth to mention about the recommended -d value for this feature.
> >>>
> >>
> >> Yes, I think it's worth. I forgot it.
> >
> > I saw your patch, but I think I should confirm what is the problem first.
> >
> >> However, when "-d 31" is specified, it will be worse.
> >> Less than 50 buffers are used to cache the compressed page.
> >> And even the page has been filtered, it will also take a buffer.
> >> So if "-d 31" is specified, the filtered page will use a lot
> >> of buffers. Then the page which needs to be compressed can't
> >> be compressed parallel.
> >
> > Could you explain why compression will not be parallel in more detail ?
> > Actually the buffers are used also for filtered pages, it sounds
> > inefficient.
> > However, I don't understand why it prevents parallel compression.
> >
> 
> Think about this, in a huge memory, most of the page will be filtered, and
> we have 5 buffers.
> 
> page1       page2      page3     page4     page5      page6       page7 .....
> [buffer1]   [2]        [3]       [4]       [5]
> unfiltered  filtered   filtered  filtered  filtered   unfiltered  filtered
> 
> Since filtered page will take a buffer, when compressing page1,
> page6 can't be compressed at the same time.
> That why it will prevent parallel compression.
> 
> > Further, according to Chao's benchmark, there is a big performance
> > degradation even if the number of thread is 1. (58s vs 240s)
> > The current implementation seems to have some problems, we should
> > solve them.
> >
> 
> If "-d 31" is specified, on the one hand we can't save time by compressing
> parallel, on the other hand we will introduce some extra work by adding
> "--num-threads". So it is obvious that it will have a performance
> degradation.
> 
> I'm not so sure if it is a problem that the performance degradation is so
> big.
> But I think if in other cases, it works as expected, this won't be a problem(
> or a problem needs to be fixed), for the performance degradation existing
> in theory.
> 
> Or the current implementation should be replaced by a new arithmetic.
> For example:
> We can add an array to record whether the page is filtered or not.
> And only the unfiltered page will take the buffer.
> 
> But I'm not sure if it is worth.
> For "-l -d 31" is fast enough, the new arithmetic also can't do much help.
> 
> --
> Thanks
> Zhou
> 
> 
> 
> _______________________________________________
> kexec mailing list
> kexec at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
>