On Tue, Mar 30, 2021 at 8:27 PM Yonghong Song <yhs@xxxxxx> wrote: > > > > On 3/30/21 8:16 PM, David Blaikie wrote: > > On Tue, Mar 30, 2021 at 8:13 PM Yonghong Song <yhs@xxxxxx> wrote: > >> > >> > >> > >> On 3/30/21 7:51 PM, David Blaikie wrote: > >>> On Tue, Mar 30, 2021 at 7:39 PM Fāng-ruì Sòng <maskray@xxxxxxxxxx> wrote: > >>>> > >>>> On Tue, Mar 30, 2021 at 6:48 PM Yonghong Song <yhs@xxxxxx> wrote: > >>>>> > >>>>> > >>>>> > >>>>> On 3/30/21 5:25 PM, Fangrui Song wrote: > >>>>>> On 2021-03-30, 'Yonghong Song' via Clang Built Linux wrote: > >>>>>>> > >>>>>>> > >>>>>>> On 3/29/21 3:52 PM, Nick Desaulniers wrote: > >>>>>>>> (replying to > >>>>>>>> https://lore.kernel.org/bpf/20210328064121.2062927-1-yhs@xxxxxx/) > >>>>>>>> > >>>>>>>> Thanks for the patch! > >>>>>>>> > >>>>>>>>> +# gcc emits compilation flags in dwarf DW_AT_producer by default > >>>>>>>>> +# while clang needs explicit flag. Add this flag explicitly. > >>>>>>>>> +ifdef CONFIG_CC_IS_CLANG > >>>>>>>>> +DEBUG_CFLAGS += -grecord-gcc-switches > >>>>>>>>> +endif > >>>>>>>>> + > >>>>>> > >>>>>> Yes, gcc defaults to -grecord-gcc-switches. Clang doesn't. > >>>>> > >>>>> Could you know why? dwarf size concern? > >>>>> > >>>>>> > >>>>>>>> This adds ~5MB/1% to vmlinux of an x86_64 defconfig built with clang. > >>>>>>>> Do we > >>>>>>>> want to add additional guards for CONFIG_DEBUG_INFO_BTF, so that we > >>>>>>>> don't have > >>>>>>>> to pay that cost if that config is not set? > >>>>>>> > >>>>>>> Since this patch is mostly motivated to detect whether the kernel is > >>>>>>> built with clang lto or not. Let me add the flag only if lto is > >>>>>>> enabled. My measurement shows 0.5% increase to thinlto-vmlinux. > >>>>>>> The smaller percentage is due to larger .debug_info section > >>>>>>> (almost double) for thinlto vs. no lto. > >>>>>>> > >>>>>>> ifdef CONFIG_LTO_CLANG > >>>>>>> DEBUG_CFLAGS += -grecord-gcc-switches > >>>>>>> endif > >>>>>>> > >>>>>>> This will make pahole with any clang built kernels, lto or non-lto. > >>>>>> > >>>>>> I share the same concern about sizes. Can't pahole know it is clang LTO > >>>>>> via other means? If pahole just needs to know the one-bit information > >>>>>> (clang LTO vs not), having every compile option seems unnecessary.... > >>>>> > >>>>> This is v2 of the patch > >>>>> https://lore.kernel.org/bpf/20210331001623.2778934-1-yhs@xxxxxx/ > >>>>> The flag will be guarded with CONFIG_LTO_CLANG. > >>>>> > >>>>> As mentioned in commit message of v2, the alternative is > >>>>> to go through every cu to find out whether DW_FORM_ref_addr is used > >>>>> or not. In other words, check every possible cross-cu references > >>>>> to find whether cross-cu reference actually happens or not. This > >>>>> is quite heavy for pahole... > >>>>> > >>>>> What we really want to know is whether cross-cu reference happens > >>>>> or not? If there is an easy way to get it, that will be great. > >>>> > >>>> +David Blaikie > >>> > >>> Yep, that shouldn't be too hard to test for more directly - scanning > >>> .debug_abbrev for DW_FORM_ref_addr should be what you need. Would that > >>> be workable rather than relying on detecting clang/lto from command > >>> line parameters? (GCC can produce these cross-CU references too, when > >>> using lto - so this approach would help make the solution generalize > >>> over GCC's behavior too) > >> > >> Thanks, David. This should be better. I tried with a non-lto vmlinux. > >> Did "llvm-dwarfdump --debug-abbrev vmlinux > log" and then > >> "grep "DW_CHILDREN_no" log | wc -l" and get 231676 records. > > > > What conclusions are you drawing from this number/data? (I'm not > > following how DW_CHILDREN_no relates to the topic - perhaps I'm > > missing something) > > Approximation of the number of tags to visit: > > ... > [10] DW_TAG_array_type DW_CHILDREN_yes > DW_AT_type DW_FORM_ref4 > DW_AT_sibling DW_FORM_ref4 > > [11] DW_TAG_variable DW_CHILDREN_no > DW_AT_name DW_FORM_strp > DW_AT_decl_file DW_FORM_data1 > DW_AT_decl_line DW_FORM_data2 > DW_AT_decl_column DW_FORM_data1 > DW_AT_type DW_FORM_ref4 > DW_AT_external DW_FORM_flag_present > DW_AT_declaration DW_FORM_flag_present > > [12] DW_TAG_member DW_CHILDREN_no > DW_AT_name DW_FORM_string > DW_AT_decl_file DW_FORM_data1 > DW_AT_decl_line DW_FORM_data1 > DW_AT_decl_column DW_FORM_data1 > DW_AT_type DW_FORM_ref4 > DW_AT_data_member_location DW_FORM_data1 > > [13] DW_TAG_subrange_type DW_CHILDREN_no > DW_AT_type DW_FORM_ref4 > DW_AT_upper_bound DW_FORM_data1 > ... > The bigger number means more tags to visit and will consume more time. > For a binary not compiled with lto, all these tags will be visited > before declaring that the dwarf does not have cross-cu reference. > So the number is just a relative guess on the cpu cost. But ya, > have to have real implementation first... Ah, sounds good, yeah.