>Hello Atsushi Kumagai, > >I have tried a lot, and I think the big performance degradation only >occurs in special CPU. >I thought about two reasons, and I need your help to confirm which is >the real one. > >The following tests will also be OK by using dumpfile instead of /proc/vmcore > >Test 1: distinguish whether it is resulted by multi-threads. >apply patch: test1 >command1: ./makedumpfile -c /proc/vmcore vmcore --num-threads 1 >command2: ./makedumpfile -c /proc/vmcore vmcore --num-threads 8 > >better to do some test in -l too. >command1: ./makedumpfile -l /proc/vmcore vmcore >command2: ./makedumpfile -l /proc/vmcore vmcore --num-threads 1 >command3: ./makedumpfile -l /proc/vmcore vmcore --num-threads 8 > >Test 2: distinguish whether it is resulted by doing compress in thread >2.1: > apply patch: test2.1 > command: ./makedumpfile -c /proc/vmcore vmcore --num-threads 1 >2.2: > apply patch: test2.2 > command: ./makedumpfile -c /proc/vmcore vmcore --num-threads 1 > >Thanks a lot. > >BTW, could you show me the cpu name, zlib version and glibc version ? Here is the my environments, I tested on 1st kernel on a physical machine from this time: CPU: Intel(R) Xeon(R) CPU E7- 8870 @ 2.40GHz (80 cores) zlib: 1.2.3-29 glibc: 2.12-1.149 Here is the summary of the result, the whole logs are attached: Test1: | threads | compress time | exec time | | 1 | 82.75 | 95.21 | | 8 | 34.1~34.8 | 40.32 | Test1(lzo): | threads | compress time | exec time | | - | - | 30.33 | | 1 | - | 33.10 | | 8 | - | 28.77 | Test2-1: | threads | compress time | exec time | | 1 | 76.12 | 82.13 | Test2-2: | threads | compress time | exec time | | 1 | 41.97 | 51.46 | There are better results than on virtual machines, but still worse than without creating thread. Additionally, I collected performance logs by perf for debugging as below: Test2-1: 38.93% makedumpfile-2- libz.so.1.2.3 [.] 0x0000000000006e30 // compress2(?) 5.16% makedumpfile-2- [kernel.kallsyms] [k] page_fault 5.02% makedumpfile-2- [kernel.kallsyms] [k] clear_page_c 4.96% makedumpfile-2- libc-2.12.so [.] __memset_sse2 4.64% makedumpfile-2- [kernel.kallsyms] [k] tick_program_event 3.77% makedumpfile-2- libz.so.1.2.3 [.] adler32 2.08% makedumpfile-2- libc-2.12.so [.] memcpy 1.95% makedumpfile-2- [kernel.kallsyms] [k] __alloc_pages_nodemask 1.54% makedumpfile-2- [kernel.kallsyms] [k] get_page_from_freelist 1.53% makedumpfile-2- [kernel.kallsyms] [k] tick_dev_program_event 1.32% makedumpfile-2- [kernel.kallsyms] [k] __mem_cgroup_commit_charge 1.08% makedumpfile-2- [kernel.kallsyms] [k] hrtimer_interrupt 0.94% makedumpfile-2- [kernel.kallsyms] [k] release_pages 0.85% makedumpfile-2- [kernel.kallsyms] [k] alloc_pages_vma ... Test2-2: 66.46% makedumpfile-2- libz.so.1.2.3 [.] 0x000000000000564e // compress2(?) 6.71% makedumpfile-2- libz.so.1.2.3 [.] adler32 5.56% makedumpfile-2- libc-2.12.so [.] __memset_sse2 3.44% makedumpfile-2- libc-2.12.so [.] memcpy 2.85% makedumpfile-2- [kernel.kallsyms] [k] tick_dev_program_event 1.30% makedumpfile-2- libz.so.1.2.3 [.] _tr_flush_block 0.95% makedumpfile-2- [kernel.kallsyms] [k] tick_program_event 0.92% makedumpfile-2- libz.so.1.2.3 [.] _tr_init 0.69% makedumpfile-2- [kernel.kallsyms] [k] hrtimer_interrupt 0.63% makedumpfile-2- [kernel.kallsyms] [k] page_fault 0.34% makedumpfile-2- libpthread-2.12.so [.] __lll_unlock_wake 0.33% makedumpfile-2- libc-2.12.so [.] _int_malloc 0.33% makedumpfile-2- libc-2.12.so [.] _int_free In the case of Test2-1(using pthread_create), longer cpu time was spent for some kernel functions like page_fault() than Test2-2(calling kdump_thread_function_cyclic() directly), it may be a one of the cause of the degradation. I haven't found the reason yet, this is just for your information. Thanks Atsushi Kumagai >-- >Thanks >Zhou Wenjian > >On 06/30/2015 05:06 PM, Atsushi Kumagai wrote: >>> On 06/26/2015 03:49 PM, Atsushi Kumagai wrote: >>>> I attached 5 processors to the VM and I confirmed that all threads >>>> consumed full cpu time by top(1) on the host: >>>> >>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>>> 17614 qemu 20 0 5792m 4.9g 5652 R 435.1 72.8 29:02.17 qemu-kvm >>>> >>>> So I think the performance must be improved... >>> >>> Since I can't get that result in all machines here, could you test it with the patch:time >>> and show me the output? >>> Using "./makedumpfile -c --num-threads 4 /proc/vmcore dumpfile" is OK. >>> >>> The attachment is the patch time. >> >> Here is the result: >> >> / # makedumpfile -c --num-threads 4 /proc/vmcore /mnt/dumpfile >> Copying data : [100.0 %] | >> Copying data : [100.0 %] \ >> >> The dumpfile is saved to /mnt/dumpfile. >> >> makedumpfile Completed. >> lock time: 310s935500us >> write time: 3s970037us >> hit time: 6s316043us >> find time: 317s926654us >> loop_time: 37s321800us >> thread consume_time: 0s0us >> thread timea: 0s0us >> thread timeb: 0s0us >> read_time[0]: 8s637011us >> lock_current_time[0]: 0s284428us >> found_time[0]: 60s366795us >> lock_consumed_time[0]: 2s782596us >> compress_time[0]: 301s427073us >> read_time[1]: 8s435914us >> lock_current_time[1]: 0s271680us >> found_time[1]: 60s329026us >> lock_consumed_time[1]: 2s849061us >> compress_time[1]: 302s98620us >> read_time[2]: 8s380550us >> lock_current_time[2]: 0s270388us >> found_time[2]: 60s209376us >> lock_consumed_time[2]: 3s297574us >> compress_time[2]: 301s486768us >> read_time[3]: 8s550662us >> lock_current_time[3]: 0s278997us >> found_time[3]: 60s476702us >> lock_consumed_time[3]: 3s49184us >> compress_time[3]: 301s718390us >> count1: 172 >> count2: 70921401 >> count3: 0 >> count4: 0 >> count5: 0 >> count6: 0 >> count7: 0 >> exec time: 380s125494us >> >> >> BTW, I fixed a small mistake before testing like: >> >> - write_time = (write2.tv_sec - write1.tv_sec) * 1000000 + (write2.tv_usec - write1.tv_usec); >> + write_time += (write2.tv_sec - write1.tv_sec) * 1000000 + (write2.tv_usec - write1.tv_usec); >> -------------- next part -------------- A non-text attachment was scrubbed... Name: test1.log Type: application/octet-stream Size: 8195 bytes Desc: test1.log URL: <http://lists.infradead.org/pipermail/kexec/attachments/20150708/53d4328c/attachment-0004.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: test1-lzo.log Type: application/octet-stream Size: 6548 bytes Desc: test1-lzo.log URL: <http://lists.infradead.org/pipermail/kexec/attachments/20150708/53d4328c/attachment-0005.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: test2-1.log Type: application/octet-stream Size: 404 bytes Desc: test2-1.log URL: <http://lists.infradead.org/pipermail/kexec/attachments/20150708/53d4328c/attachment-0006.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: test2-2.log Type: application/octet-stream Size: 404 bytes Desc: test2-2.log URL: <http://lists.infradead.org/pipermail/kexec/attachments/20150708/53d4328c/attachment-0007.obj>