Hi Tao, On Sat, 25 Mar 2023 12:12:12 +0800 Tao Liu <ltao@xxxxxxxxxx> wrote: > The primary part of the patchset will introduce multithread support for search > cmd to improve its performance. A search operation is mainly made up with 2 > steps: 1) readmem data into pagebuf, 2) search specific values within the > pagebuf. A typical workflow of search is as follows: > > for addr from low to high: > do > readmem(addr, pagebuf) > search_value(value, pagebuf) > addr += pagesize > done > > There are 2 points which we can accelerate: 1) readmem don't have to wait > search_value, when search_value is working, readmem can read the next pagebuf > at the same time. 2) higher addr don't have to wait lower addr, they can be > processed at the same time if we carefully arrange the output order. > > For point 1, we introduce zones for pagebuf, e.g. search_value can work on > zone 0 while readmem can prepare the data for zone 1. For point 2, we introduce > multiple search_value in threads, e.g. readmem will prepare 100 pages as a > batch, then we will have 4 threads of search_value, thread 0 handles page 1~25, > thread 2 handles page 26~50 page, thread 3 handles page 51~75, thread 4 handles > page 76~100. > > A typical workflow of multithread search implemented in this patchset is as > follows, which removed thread synchronization: > > pagebuf[ZONE][BATCH] > zone_index = buf_index = 0 > create_thread(4, search_value) > for addr from low to high: > do > if buf_index < BATCH > readmem(addr, pagebuf[zone_index][buf_index++]) > addr += pagesize > else > start_thread(pagebuf[zone_index], 0/4 * BATCH, 1/4 * BATCH) > start_thread(pagebuf[zone_index], 1/4 * BATCH, 2/4 * BATCH) > start_thread(pagebuf[zone_index], 2/4 * BATCH, 3/4 * BATCH) > start_thread(pagebuf[zone_index], 3/4 * BATCH, 4/4 * BATCH) > zone_index++ > buf_index = 0 > fi > done > > readmem works in the main process and not multi-threaded, because readmem will > not only read data from vmcore, decompress it, but walk through page tables if > virtual address given. It is hard to reimplement it into thread safe version, > search_value is easier to be made thread-safe. By carefully choose batch size > and thread num, we can maximize the concurrency. > > The last part of the patchset, is replacing lseek/read to pread for kcore and > diskdumped vmcore. > > Here is the performance test result chart. Please note the vmcore and > kcore are tested seperately on 2 different machines. crash-orig is the > crash compiled with clean upstream code, crash-pread is the code with only > pread patch applied(patch 5), crash-multi is the code with only multithread > patches applied(patch 1~4). > > ulong search: > > $ time echo "search abcd" | ./crash-orig vmcore vmlinux > /dev/null > $ time echo "search abcd -f 4 -n 4" | ./crash-multi vmcore vmlinux > /dev/null > > 45G vmcore 64G kcore > real user sys real user sys > crash-orig 16m56.595s 15m57.188s 0m56.698s 1m37.982s 0m51.625s 0m46.266s > crash-pread 16m46.366s 15m55.790s 0m48.894s 1m9.179s 0m36.646s 0m32.368s > crash-multi 16m26.713s 19m8.722s 1m29.263s 1m27.661s 0m57.789s 0m54.604s > > string search: > > $ time echo "search -c abcddbca" | ./crash-orig vmcore vmlinux > /dev/null > $ time echo "search -c abcddbca -f 4 -n 4" | ./crash-multi vmcore vmlinux > /dev/null > > 45G vmcore 64G kcore > real user sys real user sys > crash-orig 33m33.481s 32m38.321s 0m52.771s 8m32.034s 7m50.050s 0m41.478s > crash-pread 33m25.623s 32m35.019s 0m47.394s 8m4.347s 7m35.352s 0m28.479s > crash-multi 16m31.016s 38m27.456s 1m11.048s 5m11.725s 7m54.224s 0m44.186s > > Discussion: > > 1) Either multithread and pread patches can improve the performance a > bit, so if both patches applied, the performance can be better. > > 2) Multi-thread search performs much better in search time consumptive > tasks, such as string search. > > Tao Liu (5): > Introduce multi-thread to search_virtual > Introduce multi-thread to search_physical > Introduce multi-thread to string search > Introduce multi-thread options to search cmd > Replace lseek/read into pread for kcore and vmcore reading. > > defs.h | 6 + > diskdump.c | 11 +- > help.c | 17 +- > memory.c | 1176 +++++++++++++++++++++++++++++++++++++++++----------- > netdump.c | 5 +- > task.c | 14 + > 6 files changed, 969 insertions(+), 260 deletions(-) > Great series! I must admit that I don't get all the details of the implementation, yet. But for what I understand I agree with Lianbo. In particular that the last patch is independent and can be merged without needing to wait for the rest. Furthermore it would be great if you could move the thread handling and some basic functionality (like work queues etc.) to a library. This will not only allow other parts of crash to reuse them when they get multi threaded but it will also make your code a lot easier to understand. For example look at all the local variables needed in search_{virtual,physical}. There were already more than a human brain can handle and you had to add quite a bunch more. Having that said, I've noticed that you had to make the same changes over and over again. For me that's usually a sign that multiple functions are doing the same and should be merged. For example take search_chars and search_chars_p. The body of functions are identical. The only difference between them is the return type. Similar the other search_{ulong,uint,ushort}{_p} functions or search_virtual and search_physical. This is an old problem of the code but when you need to touch it anyway it's a great opportunity for a cleanup ;-) Finally I don't understand the interface you are proposing. Especially the -f option seems much to complicated to me. Why can't the user simply pass the batch size (or if you prefer the batch size per thread)? At least for me that would be much more intuitive compared to some 'factor' that is later used to calculate the batch size. Furthermore I don't understand why the -c and -x options are mutual exclusive. The way I see it there shouldn't be much of a difference between the single and multi threaded case. Thanks Philipp -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/crash-utility Contribution Guidelines: https://github.com/crash-utility/crash/wiki