Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes: > On Wed, Mar 16, 2022 at 11:11 PM Stephen Brennan <stephen@xxxxxxxxxx> wrote: >> >> Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> writes: >> [...] >> >> I think that kallsyms, BTF, and ORC together will be enough to provide a >> >> lite debugging experience. Some things will be missing: >> > >> >> - mapping backtrace addresses to source code lines >> > >> > So, BTF has provisions for that, and its present in the eBPF programs, >> > perf annotate uses it, see tools/perf/util/annotate.c, >> > symbol__disassemble_bpf(), it goes like: >> > >> > struct bpf_prog_linfo *prog_linfo = NULL; >> > >> > info_node = perf_env__find_bpf_prog_info(dso->bpf_prog.env, >> > dso->bpf_prog.id); >> > if (!info_node) { >> > ret = SYMBOL_ANNOTATE_ERRNO__BPF_MISSING_BTF; >> > goto out; >> > } >> > info_linear = info_node->info_linear; >> > sub_id = dso->bpf_prog.sub_id; >> > >> > info.buffer = (void *)(uintptr_t)(info_linear->info.jited_prog_insns); >> > info.buffer_length = info_linear->info.jited_prog_len; >> > >> > if (info_linear->info.nr_line_info) >> > prog_linfo = bpf_prog_linfo__new(&info_linear->info); >> > >> > addr = pc + ((u64 *)(uintptr_t)(info_linear->info.jited_ksyms))[sub_id]; >> > count = disassemble(pc, &info); >> > >> > if (prog_linfo) >> > linfo = bpf_prog_linfo__lfind_addr_func(prog_linfo, >> > addr, sub_id, >> > nr_skip); >> > if (linfo && btf) { >> > srcline = btf__name_by_offset(btf, linfo->line_off); >> > nr_skip++; >> > } else >> > srcline = NULL; >> > >> > etc. >> > >> > Having this for the kernel proper is thus doable, but then we go on >> > making BTF info grow. >> > >> > Perhaps having this as optional, distros or appliances wanting to have a >> > kernel with this extra info would add it and then tools would use it if >> > available? >> >> I didn't know about the source code mapping support! And I certainly see >> the utility of it for BPF programs. However, I'm not sure that a "lite" >> kernel debugging experience *needs* source line mapping. I suppose I >> should have made it more clear, but I don't think of that list of >> "missing" features as a checklist of things we'd want feature parity >> for. >> >> The advantage of BTF for debugging would be that it is small, and that >> it is part of the kernel image without referencing any other file, >> build-id, or kernel version. Ideally, a debugger could load a crash dump >> with no additional information, and support a reasonable level of >> debugging. I think looking up typed data structure values via global >> symbols is part of that level, as well as simple backtraces and other >> memory access. >> >> I wouldn't want to try to re-implement DWARF for debuginfo. If you have >> the DWARF debuginfo, then your experience should be much better. >> >> >> - intelligent stack frame information from DWARF CFI (e.g. >> >> register/variable values) >> >> - probably other things, I'm not a DWARF expert. >> [...] >> >> > Currently on my local machine, the vmlinux BTF's size is 4.2MB and >> >> > adding 1MB would be a big increase. CONFIG_DEBUG_INFO_BTF_ALL is a good >> >> > idea. But we might be able to just add global variables without this >> >> > new config if we have strong use case. >> > >> >> And unfortunately 1MiB is really just a shot in the dark, guessing >> >> around 70k variables with no string data. >> > >> > Maybe we can have a separate BTF file with all this extra info that >> > could be fetched from somewhere, keyed by build-id, like is now possible >> > with debuginfod and DWARF? >> >> For me, this ranges into the territory of duplicating DWARF. If you lose >> the one key advantage of "debuginfoless debugging", then you might as >> well use the build-id to lookup DWARF debuginfo as we can today. >> >> This is why I'm trying to propose the means of combining the kallsyms >> string data with BTF. Anything that can make the overall size increase >> manageable so that all the necessary data can stay in the kernel image. > > I think this quirk of using kallsyms strings is a no-go. But we should > experiment and see how much bigger BTF becomes when including all the > variables. Can you try to prototype pahole's support for this? Hi Andrii, Sorry for such a delay here. I tried to prototype this last month but encountered some issues I couldn't resolve. But recently I picked it up and I've created a prototype [1] which outputs all variables. (It's a quite bad prototype, it strips out some useful logic regarding the BTF_VAR_DATASEC for percpu variables. But I think it's good enough). On my 5.4-based kernel I saw an increase in BTF section size from 3.8 MiB all the way to 6.1 MiB, or more precisely: BTF section before: 3905938 bytes BTF section after: 6391989 bytes (+2486051, +63.6%) So almost a 2.5 MiB increase. My prototype doesn't output the btf_var_secinfo structs for percpu variables anymore, which probably breaks some BPF and reduces BTF slightly. But it also is outputting a few thousand "dwarf variables" which were correctly filtered before, so I think it's a wash and it's a pretty good comparison. Clearly it can't be added without a configuration option, as 2.5 MiB is pretty huge for a kernel memory addition. But I don't think it's so huge that nobody would enable it. I know I would :) [1]: https://github.com/brenns10/dwarves/tree/remove_percpu_restriction_1 > As you > said, we can guard this extra information with KConfig and pahole > flags, so distros can always opt-out of bigger BTF if that's too > prohibitive. As it is right now, without firm understanding how big > the final BTF is it's hard to make a good decision about go or no-go > for this. Hopefully this comparison sheds some light on that now! > > As for including source code itself, it going to be prohibitively > huge, so it's probably out of the question for now as well. Yeah, I wouldn't advocate for that. Now, to share some of the cool possibilities that this enables. I have: - prototype pahole [1] used for the kernel build, - a prototype drgn with BTF+kallsyms support [2], - some small kernel patches which add symbols to vmcoreinfo, so that drgn can find the kallsyms section. I'm happy to share these, I just haven't sent them anywhere yet. [2]: https://github.com/brenns10/drgn/tree/kallsyms_plus_btf Combining these three things, I've got a debugger which can open up a vmcore _without DWARF debuginfo_ and allow you to print out typed variable values. It just relies on BTF + kallsyms. So the proof of concept is proven, and I'm quite excited about it! Stephen