Re: [PATCH kbuild] kbuild: add -grecord-gcc-switches to clang build

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 3/30/21 8:16 PM, David Blaikie wrote:
On Tue, Mar 30, 2021 at 8:13 PM Yonghong Song <yhs@xxxxxx> wrote:



On 3/30/21 7:51 PM, David Blaikie wrote:
On Tue, Mar 30, 2021 at 7:39 PM Fāng-ruì Sòng <maskray@xxxxxxxxxx> wrote:

On Tue, Mar 30, 2021 at 6:48 PM Yonghong Song <yhs@xxxxxx> wrote:



On 3/30/21 5:25 PM, Fangrui Song wrote:
On 2021-03-30, 'Yonghong Song' via Clang Built Linux wrote:


On 3/29/21 3:52 PM, Nick Desaulniers wrote:
(replying to
https://lore.kernel.org/bpf/20210328064121.2062927-1-yhs@xxxxxx/)

Thanks for the patch!

+# gcc emits compilation flags in dwarf DW_AT_producer by default
+# while clang needs explicit flag. Add this flag explicitly.
+ifdef CONFIG_CC_IS_CLANG
+DEBUG_CFLAGS    += -grecord-gcc-switches
+endif
+

Yes, gcc defaults to -grecord-gcc-switches. Clang doesn't.

Could you know why? dwarf size concern?


This adds ~5MB/1% to vmlinux of an x86_64 defconfig built with clang.
Do we
want to add additional guards for CONFIG_DEBUG_INFO_BTF, so that we
don't have
to pay that cost if that config is not set?

Since this patch is mostly motivated to detect whether the kernel is
built with clang lto or not. Let me add the flag only if lto is
enabled. My measurement shows 0.5% increase to thinlto-vmlinux.
The smaller percentage is due to larger .debug_info section
(almost double) for thinlto vs. no lto.

ifdef CONFIG_LTO_CLANG
DEBUG_CFLAGS   += -grecord-gcc-switches
endif

This will make pahole with any clang built kernels, lto or non-lto.

I share the same concern about sizes. Can't pahole know it is clang LTO
via other means? If pahole just needs to know the one-bit information
(clang LTO vs not), having every compile option seems unnecessary....

This is v2 of the patch
     https://lore.kernel.org/bpf/20210331001623.2778934-1-yhs@xxxxxx/
The flag will be guarded with CONFIG_LTO_CLANG.

As mentioned in commit message of v2, the alternative is
to go through every cu to find out whether DW_FORM_ref_addr is used
or not. In other words, check every possible cross-cu references
to find whether cross-cu reference actually happens or not. This
is quite heavy for pahole...

What we really want to know is whether cross-cu reference happens
or not? If there is an easy way to get it, that will be great.

+David Blaikie

Yep, that shouldn't be too hard to test for more directly - scanning
.debug_abbrev for DW_FORM_ref_addr should be what you need. Would that
be workable rather than relying on detecting clang/lto from command
line parameters? (GCC can produce these cross-CU references too, when
using lto - so this approach would help make the solution generalize
over GCC's behavior too)

Thanks, David. This should be better. I tried with a non-lto vmlinux.
Did "llvm-dwarfdump --debug-abbrev vmlinux > log" and then
"grep "DW_CHILDREN_no" log | wc -l" and get 231676 records.

What conclusions are you drawing from this number/data? (I'm not
following how DW_CHILDREN_no relates to the topic - perhaps I'm
missing something)

Approximation of the number of tags to visit:

...
[10] DW_TAG_array_type  DW_CHILDREN_yes
        DW_AT_type      DW_FORM_ref4
        DW_AT_sibling   DW_FORM_ref4

[11] DW_TAG_variable    DW_CHILDREN_no
        DW_AT_name      DW_FORM_strp
        DW_AT_decl_file DW_FORM_data1
        DW_AT_decl_line DW_FORM_data2
        DW_AT_decl_column       DW_FORM_data1
        DW_AT_type      DW_FORM_ref4
        DW_AT_external  DW_FORM_flag_present
        DW_AT_declaration       DW_FORM_flag_present

[12] DW_TAG_member      DW_CHILDREN_no
        DW_AT_name      DW_FORM_string
        DW_AT_decl_file DW_FORM_data1
        DW_AT_decl_line DW_FORM_data1
        DW_AT_decl_column       DW_FORM_data1
        DW_AT_type      DW_FORM_ref4
        DW_AT_data_member_location      DW_FORM_data1

[13] DW_TAG_subrange_type       DW_CHILDREN_no
        DW_AT_type      DW_FORM_ref4
        DW_AT_upper_bound       DW_FORM_data1
...
The bigger number means more tags to visit and will consume more time.
For a binary not compiled with lto, all these tags will be visited
before declaring that the dwarf does not have cross-cu reference.
So the number is just a relative guess on the cpu cost. But ya,
have to have real implementation first...


I will try this approach. If the time is a very small fraction of
actual dwarf cu processing time, we should be fine. This definitely
better than visit all die's in cu trying to detect cross-cu reference.

*fingers crossed*




[Index of Archives]     [Linux&nblp;USB Development]     [Linux Media]     [Video for Linux]     [Linux Audio Users]     [Yosemite Secrets]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux