On Fri, Feb 10, 2023 at 12:42:36AM +0000, HAGIO KAZUHITO(萩尾 一仁) wrote: > Hi Tao, > > On 2023/02/08 13:34, Tao Liu wrote: > > Hello, > > > > Recently I made an attempt to introduce a thread pool for crash utility, to > > optimize the performance of crash. > > Thank you for the attempt, interesting. > What data made you try to speed up the collection of member offsets? > First I'm interested in which routines weigh with crash startup. > > To be honest, personally I'm fairly satisfied with the current > crash-8 startup time :) with commit cd8954023b (thanks to Hatayama-san) > and maybe the benefit of GDB's parallel loading, which Andrew said. > > [root@rhel91u ~]# time echo quit | crash > /dev/null > > real 0m2.621s > user 0m2.574s > sys 0m0.112s > > So I would like to know first whether it's likely to worth looking > into the slow part, especially if we have such complexity of > multi-threading or another way. > > Thanks, > Kazu > Hi Kazu, The startup time of crash and drgn comparison: $ time echo "quit()" | drgn -c /tmp/maple/vmcore -s /tmp/maple/vmlinux > /dev/null drgn 0.0.22+2.g33c3e36 (using Python 3.10.0, elfutils 0.185, with libkdumpfile) For help, type help(drgn). >>> import drgn >>> from drgn import NULL, Object, cast, container_of, execscript, offsetof, reinterpret, sizeof >>> from drgn.helpers.common import * >>> from drgn.helpers.linux import * warning: could not get debugging information for: kernel modules (could not read depmod: open: /lib/modules/5.19.0-uek+/modules.dep.bin: No such file or directory) real 0m0.222s user 0m1.032s sys 0m0.041s $ time echo quit | crash /tmp/maple/vmcore /tmp/maple/vmlinux > /dev/null real 0m8.266s user 0m7.853s sys 0m0.248s As you see, there is a startup time difference between crash and drgn. I haven't look into drgn's source code to see if drgn have the same amount of startup work as crash. Then I measured the time consuming status of xx_init() functions of crash, and gprof the time status: $ crash /tmp/maple/vmcore /tmp/maple/vmlinux gdb_session_init 23.761000 machdep_init POST_RELOC 0.060000 show_untrusted_files 0.000000 kdump_backup_region_init 0.001000 read_in_kernel_config 1.287000 kernel_init 1724.044000 <<--- a lot of symbols resolving machdep_init POST_GDB 851.425000 vm_init 1993.168000 <<--- a lot of symbols resolving machdep_init POST_VM 414.680000 module_init 287.185000 help_init 0.005000 task_init 459.189000 vfs_init 301.908000 net_init 116.945000 dev_init 6.494000 machdep_init POST_INIT 9.176000 Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 14.47 1.12 1.12 skip_ws(char const*&, char const*&, char const*) 14.08 2.21 1.09 lookup_partial_symbol(objfile*, partial_symtab*, lookup_name_info const&, int, domain_enum_tag) 13.44 3.25 1.04 strncmp_iw_with_mode(char const*, char const*, unsigned long, strncmp_iw_mode, language, completion_match_for_lcd*) 5.81 3.70 0.45 symbol_matches_search_name(general_symbol_info const*, lookup_name_info const&) 4.52 4.05 0.35 gdb::bcache::insert(void const*, int, bool*) 4.26 4.38 0.33 default_symbol_name_matcher(char const*, lookup_name_info const&, completion_match_result*) 4.26 4.71 0.33 language_defn::get_symbol_name_matcher(lookup_name_info const&) const 3.23 4.96 0.25 symbol_matches_domain(language, domain_enum_tag, domain_enum_tag) 2.84 5.18 0.22 1 220.00 220.00 symval_hash_init 2.20 5.35 0.17 read_attribute_value(die_reader_specs const*, attribute*, unsigned int, long, unsigned char const*, bool*) 2.20 5.52 0.17 iterative_hash 1.81 5.66 0.14 strcmp_iw_ordered(char const*, char const*) 1.81 5.80 0.14 psym_lookup_symbol(objfile*, block_enum, char const*, domain_enum_tag) 1.68 5.93 0.13 dwarf2_psymtab::readin_p(objfile*) const 1.55 6.05 0.12 133543 0.00 0.00 symname_hash_install ... It looks to me that crash takes a lot of time on symbol resolving, so I thought maybe I can parallel some of those to shorten the startup time. Thanks, Tao Liu > > > > One obvious point which can benefit from multi-threading is memory.c:vm_init(). > > There are hundreds of MEMBER_OFFSET_INIT() related symbol resolving functions, > > and most of the symbols are independent from each other, by careful arrangement, > > they can be invoked parallelly. By doing so, we can shorten the waiting time of > > crash starting. > > > > The implementation is abstracted as the following: > > > > Before multi-threading: > > MEMBER_OFFSET_INIT(task_struct_mm, "task_struct", "mm"); > > MEMBER_OFFSET_INIT(mm_struct_mmap, "mm_struct", "mmap"); > > > > After multi-threading: > > create_threadpool(&pool, 3); > > ... > > MEMBER_OFFSET_INIT_PARA(pool, task_struct_mm, "task_struct", "mm"); > > MEMBER_OFFSET_INIT_PARA(pool, mm_struct_mmap, "mm_struct", "mmap"); > > ... > > wait_and_destroy_threadpool(pool); > > > > MEMBER_OFFSET_INIT_PARA just append the task to the work queue of thread pool > > and continues, it's up to the pool to schedule the worker thread to do the > > symbol resolving work. > > > > However, after enable multi-threading, I noticed there are always random errors > > from gdb. From segfault to broken stack, it seems gdb is not thread safe at > > all... > > > > For example one error listed as follows: > > > > Thread 10 "crash" received signal SIGSEGV, Segmentation fault. > > [Switching to Thread 0x7fffc4f00640 (LWP 72950)] > > c_yylex () at /sources/up-crash/gdb-10.2/gdb/c-exp.y:3250 > > 3250 if (pstate->language ()->la_language != language_cplus > > (gdb) bt > > #0 c_yylex () at /sources/up-crash/gdb-10.2/gdb/c-exp.y:3250 > > #1 c_yyparse () at /sources/up-crash/gdb-10.2/gdb/c-exp.c.tmp:2092 > > #2 0x00000000006f62d7 in c_parse (par_state=<optimized out>) at /sources/ > > up-crash/gdb-10.2/gdb/c-exp.y:3414 > > #3 0x0000000000894eac in parse_exp_in_context (stringptr=0x7fffc4efeff8, > > pc=<optimized out>, block=<optimized out>, comma=0, out_subexp=0x0, > > tracker=0x7fffc4efef10, cstate=0x0, void_context_p=0) at parse.c:1122 > > #4 0x00000000008951d6 in parse_exp_1 (tracker=0x0, comma=0, block=0x0, > > pc=0, stringptr=0x7fffc4efeff8) at parse.c:1031 > > #5 parse_expression (string=<optimized out>, string@entry=0x7fffc4eff140 > > "slab_s", tracker=tracker@entry=0x0) at parse.c:1166 > > #6 0x000000000092039a in gdb_get_datatype (req=0x7fffc4eff720) at symtab.c:7239 > > #7 gdb_command_funnel_1 (req=0x7fffc4eff720) at symtab.c:7018 > > #8 0x00000000009206de in gdb_command_funnel (req=0x7fffc4eff720) at symtab.c:6956 > > #9 0x00000000005ad137 in gdb_interface (req=0x7fffc4eff720) at gdb_interface.c:409 > > #10 0x00000000005fe76c in datatype_info (name=0xab9700 "slab_s", > > member=0xaba8d8 "list", dm=0x0) at symbols.c:5708 > > #11 0x0000000000517a85 in member_offset_init_slab_s_list_slab_s_list () > > at memory.c:659 > > #12 0x000000000068168f in group_routine (args=<optimized out>) at thpool.c:81 > > #13 0x00007ffff7a48b17 in start_thread () from /lib64/libc.so.6 > > #14 0x00007ffff7acd6c0 in clone3 () from /lib64/libc.so.6 > > (gdb) p pstate > > $1 = (parser_state *) 0x0 > > > > $ cat -n /sources/up-crash/gdb-10.2/gdb/c-exp.y > > 66 /* The state of the parser, used internally when we are parsing the > > 67 expression. */ > > 68 > > 69 static struct parser_state *pstate = NULL; > > > > pstate is a global variable and not thread safe, the value must be changed by > > someone else... > > > > Now the project has reached a dead end. Because making gdb thread safe is an > > impossible mission to me. Is there any advice or suggestions? Thanks in advance! > > > > Thanks! > > Tao Liu > > > > -- > > Crash-utility mailing list > > Crash-utility@xxxxxxxxxx > > https://listman.redhat.com/mailman/listinfo/crash-utility > > Contribution Guidelines: https://github.com/crash-utility/crash/wiki -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/crash-utility Contribution Guidelines: https://github.com/crash-utility/crash/wiki