Re: [PATCH kbuild] kbuild: add -grecord-gcc-switches to clang build

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 30, 2021 at 8:27 PM Yonghong Song <yhs@xxxxxx> wrote:
>
>
>
> On 3/30/21 8:16 PM, David Blaikie wrote:
> > On Tue, Mar 30, 2021 at 8:13 PM Yonghong Song <yhs@xxxxxx> wrote:
> >>
> >>
> >>
> >> On 3/30/21 7:51 PM, David Blaikie wrote:
> >>> On Tue, Mar 30, 2021 at 7:39 PM Fāng-ruì Sòng <maskray@xxxxxxxxxx> wrote:
> >>>>
> >>>> On Tue, Mar 30, 2021 at 6:48 PM Yonghong Song <yhs@xxxxxx> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 3/30/21 5:25 PM, Fangrui Song wrote:
> >>>>>> On 2021-03-30, 'Yonghong Song' via Clang Built Linux wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> On 3/29/21 3:52 PM, Nick Desaulniers wrote:
> >>>>>>>> (replying to
> >>>>>>>> https://lore.kernel.org/bpf/20210328064121.2062927-1-yhs@xxxxxx/)
> >>>>>>>>
> >>>>>>>> Thanks for the patch!
> >>>>>>>>
> >>>>>>>>> +# gcc emits compilation flags in dwarf DW_AT_producer by default
> >>>>>>>>> +# while clang needs explicit flag. Add this flag explicitly.
> >>>>>>>>> +ifdef CONFIG_CC_IS_CLANG
> >>>>>>>>> +DEBUG_CFLAGS    += -grecord-gcc-switches
> >>>>>>>>> +endif
> >>>>>>>>> +
> >>>>>>
> >>>>>> Yes, gcc defaults to -grecord-gcc-switches. Clang doesn't.
> >>>>>
> >>>>> Could you know why? dwarf size concern?
> >>>>>
> >>>>>>
> >>>>>>>> This adds ~5MB/1% to vmlinux of an x86_64 defconfig built with clang.
> >>>>>>>> Do we
> >>>>>>>> want to add additional guards for CONFIG_DEBUG_INFO_BTF, so that we
> >>>>>>>> don't have
> >>>>>>>> to pay that cost if that config is not set?
> >>>>>>>
> >>>>>>> Since this patch is mostly motivated to detect whether the kernel is
> >>>>>>> built with clang lto or not. Let me add the flag only if lto is
> >>>>>>> enabled. My measurement shows 0.5% increase to thinlto-vmlinux.
> >>>>>>> The smaller percentage is due to larger .debug_info section
> >>>>>>> (almost double) for thinlto vs. no lto.
> >>>>>>>
> >>>>>>> ifdef CONFIG_LTO_CLANG
> >>>>>>> DEBUG_CFLAGS   += -grecord-gcc-switches
> >>>>>>> endif
> >>>>>>>
> >>>>>>> This will make pahole with any clang built kernels, lto or non-lto.
> >>>>>>
> >>>>>> I share the same concern about sizes. Can't pahole know it is clang LTO
> >>>>>> via other means? If pahole just needs to know the one-bit information
> >>>>>> (clang LTO vs not), having every compile option seems unnecessary....
> >>>>>
> >>>>> This is v2 of the patch
> >>>>>      https://lore.kernel.org/bpf/20210331001623.2778934-1-yhs@xxxxxx/
> >>>>> The flag will be guarded with CONFIG_LTO_CLANG.
> >>>>>
> >>>>> As mentioned in commit message of v2, the alternative is
> >>>>> to go through every cu to find out whether DW_FORM_ref_addr is used
> >>>>> or not. In other words, check every possible cross-cu references
> >>>>> to find whether cross-cu reference actually happens or not. This
> >>>>> is quite heavy for pahole...
> >>>>>
> >>>>> What we really want to know is whether cross-cu reference happens
> >>>>> or not? If there is an easy way to get it, that will be great.
> >>>>
> >>>> +David Blaikie
> >>>
> >>> Yep, that shouldn't be too hard to test for more directly - scanning
> >>> .debug_abbrev for DW_FORM_ref_addr should be what you need. Would that
> >>> be workable rather than relying on detecting clang/lto from command
> >>> line parameters? (GCC can produce these cross-CU references too, when
> >>> using lto - so this approach would help make the solution generalize
> >>> over GCC's behavior too)
> >>
> >> Thanks, David. This should be better. I tried with a non-lto vmlinux.
> >> Did "llvm-dwarfdump --debug-abbrev vmlinux > log" and then
> >> "grep "DW_CHILDREN_no" log | wc -l" and get 231676 records.
> >
> > What conclusions are you drawing from this number/data? (I'm not
> > following how DW_CHILDREN_no relates to the topic - perhaps I'm
> > missing something)
>
> Approximation of the number of tags to visit:
>
> ...
> [10] DW_TAG_array_type  DW_CHILDREN_yes
>          DW_AT_type      DW_FORM_ref4
>          DW_AT_sibling   DW_FORM_ref4
>
> [11] DW_TAG_variable    DW_CHILDREN_no
>          DW_AT_name      DW_FORM_strp
>          DW_AT_decl_file DW_FORM_data1
>          DW_AT_decl_line DW_FORM_data2
>          DW_AT_decl_column       DW_FORM_data1
>          DW_AT_type      DW_FORM_ref4
>          DW_AT_external  DW_FORM_flag_present
>          DW_AT_declaration       DW_FORM_flag_present
>
> [12] DW_TAG_member      DW_CHILDREN_no
>          DW_AT_name      DW_FORM_string
>          DW_AT_decl_file DW_FORM_data1
>          DW_AT_decl_line DW_FORM_data1
>          DW_AT_decl_column       DW_FORM_data1
>          DW_AT_type      DW_FORM_ref4
>          DW_AT_data_member_location      DW_FORM_data1
>
> [13] DW_TAG_subrange_type       DW_CHILDREN_no
>          DW_AT_type      DW_FORM_ref4
>          DW_AT_upper_bound       DW_FORM_data1
> ...
> The bigger number means more tags to visit and will consume more time.
> For a binary not compiled with lto, all these tags will be visited
> before declaring that the dwarf does not have cross-cu reference.
> So the number is just a relative guess on the cpu cost. But ya,
> have to have real implementation first...

Ah, sounds good, yeah.




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux