Re: [PATCH v6 9/9] drm/amd/display: Mark dc_fixpt_from_fraction() noinline

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/19/2024 03:22 AM, Alex Deucher wrote:
On Wed, Dec 18, 2024 at 2:18 PM Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:

On Wed, Dec 18, 2024 at 10:36:00PM +0800, Huacai Chen wrote:
Hi, Tiezhu,

On Tue, Dec 17, 2024 at 9:50 AM Tiezhu Yang <yangtiezhu@xxxxxxxxxxx> wrote:

When compiling with Clang on LoongArch, there exists the following objtool
warning in drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.o:

  dc_fixpt_recip() falls through to next function dc_fixpt_sinc()

This is because dc_fixpt_from_fraction() is inlined in dc_fixpt_recip()
by Clang, given dc_fixpt_from_fraction() is not a simple function, just
mark it noinline to avoid the above issue.
I don't know whether drm maintainers can accept this, because it looks
like a workaround. Yes, uninline this function "solve" a problem and
seems reasonable in this case because the function is "not simple",
but from another point of view, you may hide a type of bug.

Agreed, it sounds like there's definitely a bug which this patch is
papering over.

Yes, agreed.

Additional Info:

In order to avoid the effect of this series, I tested with kernel
6.13-rc3 without this series again.

--------------------------------------------------------------------------
Here are the test result.

1. For LoongArch

"dc_fixpt_recip() falls through to next function dc_fixpt_sinc()"
objtool warning only appears compiled with the latest mainline LLVM,
there is no this issue compiled with the latest release version such
as LLVM 19.1.6.

(1) objdump info with LLVM release version 19.1.6:

$ clang --version | head -1
clang version 19.1.6

There is an jump instruction "b" at the end of dc_fixpt_recip(), it
maybe jumps to a position and then steps to the return instruction, so
there is no "falls through" objtool warning.

0000000000000200 <dc_fixpt_recip>:
200: 40009480 beqz $a0, 148 # 294 <dc_fixpt_recip+0x94>
     204:       0049fc85        srai.d          $a1, $a0, 0x3f
     208:       00159486        xor             $a2, $a0, $a1
     20c:       001194c5        sub.d           $a1, $a2, $a1
     210:       03800007        ori             $a3, $zero, 0x0
     214:       16000027        lu32i.d         $a3, 1
     218:       002314e6        div.du          $a2, $a3, $a1
     21c:       001d94c8        mul.d           $a4, $a2, $a1
     220:       0011a0e9        sub.d           $a5, $a3, $a4
     224:       02bf8008        addi.w          $a4, $zero, -32
     228:       03400000        andi            $zero, $zero, 0x0
     22c:       03400000        andi            $zero, $zero, 0x0
     230:       00410529        slli.d          $a5, $a5, 0x1
     234:       004104c6        slli.d          $a2, $a2, 0x1
     238:       0012952a        sltu            $a6, $a5, $a1
     23c:       03c0054a        xori            $a6, $a6, 0x1
     240:       001328ab        maskeqz         $a7, $a1, $a6
     244:       0011ad29        sub.d           $a5, $a5, $a7
     248:       00df0108        bstrpick.d      $a4, $a4, 0x1f, 0x0
     24c:       02c00508        addi.d          $a4, $a4, 1
     250:       00149d0b        and             $a7, $a4, $a3
     254:       001528c6        or              $a2, $a2, $a6
258: 43ffd97f beqz $a7, -40 # 230 <dc_fixpt_recip+0x30>
     25c:       00410527        slli.d          $a3, $a5, 0x1
     260:       001294e5        sltu            $a1, $a3, $a1
     264:       03c004a5        xori            $a1, $a1, 0x1
     268:       02bffc07        addi.w          $a3, $zero, -1
     26c:       031ffce7        lu52i.d         $a3, $a3, 2047
     270:       00159ca7        xor             $a3, $a1, $a3
274: 680030e6 bltu $a3, $a2, 48 # 2a4 <dc_fixpt_recip+0xa4>
     278:       001094c5        add.d           $a1, $a2, $a1
     27c:       00119406        sub.d           $a2, $zero, $a1
     280:       02000084        slti            $a0, $a0, 0
     284:       001390a5        masknez         $a1, $a1, $a0
     288:       001310c4        maskeqz         $a0, $a2, $a0
     28c:       00151484        or              $a0, $a0, $a1
     290:       4c000020        jirl            $zero, $ra, 0
     294:       00150005        or              $a1, $zero, $zero
     298:       002a0001        break           0x1
     29c:       002a0001        break           0x1
2a0: 53ff73ff b -144 # 210 <dc_fixpt_recip+0x10>
     2a4:       002a0001        break           0x1
2a8: 53ffd3ff b -48 # 278 <dc_fixpt_recip+0x78>
     2ac:       03400000        andi            $zero, $zero, 0x0
     2b0:       03400000        andi            $zero, $zero, 0x0
     2b4:       03400000        andi            $zero, $zero, 0x0
     2b8:       03400000        andi            $zero, $zero, 0x0
     2bc:       03400000        andi            $zero, $zero, 0x0

00000000000002c0 <dc_fixpt_sinc>:
     2c0:       0049fc85        srai.d          $a1, $a0, 0x3f
     2c4:       00159486        xor             $a2, $a0, $a1
     2c8:       1490fda7        lu12i.w         $a3, 296941
     2cc:       039444e7        ori             $a3, $a3, 0x511
     2d0:       001194c6        sub.d           $a2, $a2, $a1
     2d4:       001500e5        or              $a1, $a3, $zero
     2d8:       160000c5        lu32i.d         $a1, 6
     2dc:       001500c9        or              $a5, $a2, $zero
     2e0:       00150088        or              $a4, $a0, $zero
2e4: 6000a8c5 blt $a2, $a1, 168 # 38c <dc_fixpt_sinc+0xcc>

(2) objdump info with LLVM mainline version 20.0.0git:

$ clang --version | head -1
clang version 20.0.0git (https://github.com/llvm/llvm-project.git 8daf4f16fa08b5d876e98108721dd1743a360326)

There is "break" instruction at the end of dc_fixpt_recip(), its offset
is "2a0", this "break" instruction is not dead end due to the offset
"2a4" is in the relocation section '.rela.discard.reachable', that is
to say, dc_fixpt_recip() doesn't end with a return instruction or an
unconditional jump, so objtool determined that the function can fall
through into the next function, thus there is "falls through" objtool
warning.

0000000000000200 <dc_fixpt_recip>:
200: 40009c80 beqz $a0, 156 # 29c <dc_fixpt_recip+0x9c>
     204:       0049fc85        srai.d          $a1, $a0, 0x3f
     208:       00159486        xor             $a2, $a0, $a1
     20c:       001194c5        sub.d           $a1, $a2, $a1
     210:       03800007        ori             $a3, $zero, 0x0
     214:       16000027        lu32i.d         $a3, 1
     218:       002314e6        div.du          $a2, $a3, $a1
     21c:       001d94c8        mul.d           $a4, $a2, $a1
     220:       0011a0e9        sub.d           $a5, $a3, $a4
     224:       02bf8008        addi.w          $a4, $zero, -32
     228:       03400000        andi            $zero, $zero, 0x0
     22c:       03400000        andi            $zero, $zero, 0x0
     230:       00410529        slli.d          $a5, $a5, 0x1
     234:       004104c6        slli.d          $a2, $a2, 0x1
     238:       0012952a        sltu            $a6, $a5, $a1
     23c:       03c0054a        xori            $a6, $a6, 0x1
     240:       001328ab        maskeqz         $a7, $a1, $a6
     244:       0011ad29        sub.d           $a5, $a5, $a7
     248:       00df0108        bstrpick.d      $a4, $a4, 0x1f, 0x0
     24c:       02c00508        addi.d          $a4, $a4, 1
     250:       00149d0b        and             $a7, $a4, $a3
     254:       001528c6        or              $a2, $a2, $a6
258: 43ffd97f beqz $a7, -40 # 230 <dc_fixpt_recip+0x30>
     25c:       00410527        slli.d          $a3, $a5, 0x1
     260:       001294e5        sltu            $a1, $a3, $a1
     264:       03c004a5        xori            $a1, $a1, 0x1
     268:       02bffc07        addi.w          $a3, $zero, -1
     26c:       031ffce7        lu52i.d         $a3, $a3, 2047
     270:       00159ca7        xor             $a3, $a1, $a3
274: 680020e6 bltu $a3, $a2, 32 # 294 <dc_fixpt_recip+0x94>
     278:       001094c5        add.d           $a1, $a2, $a1
     27c:       00119406        sub.d           $a2, $zero, $a1
     280:       02000084        slti            $a0, $a0, 0
     284:       001390a5        masknez         $a1, $a1, $a0
     288:       001310c4        maskeqz         $a0, $a2, $a0
     28c:       00151484        or              $a0, $a0, $a1
     290:       4c000020        jirl            $zero, $ra, 0
     294:       002a0001        break           0x1
298: 53ffe3ff b -32 # 278 <dc_fixpt_recip+0x78>
     29c:       002a0001        break           0x1
     2a0:       002a0001        break           0x1
     2a4:       03400000        andi            $zero, $zero, 0x0
     2a8:       03400000        andi            $zero, $zero, 0x0
     2ac:       03400000        andi            $zero, $zero, 0x0
     2b0:       03400000        andi            $zero, $zero, 0x0
     2b4:       03400000        andi            $zero, $zero, 0x0
     2b8:       03400000        andi            $zero, $zero, 0x0
     2bc:       03400000        andi            $zero, $zero, 0x0

00000000000002c0 <dc_fixpt_sinc>:
     2c0:       0049fc85        srai.d          $a1, $a0, 0x3f
     2c4:       00159486        xor             $a2, $a0, $a1
     2c8:       001194c6        sub.d           $a2, $a2, $a1
     2cc:       1490fda5        lu12i.w         $a1, 296941
     2d0:       039444a5        ori             $a1, $a1, 0x511
     2d4:       160000c5        lu32i.d         $a1, 6
     2d8:       001500c7        or              $a3, $a2, $zero
     2dc:       00150088        or              $a4, $a0, $zero
2e0: 600090c5 blt $a2, $a1, 144 # 370 <dc_fixpt_sinc+0xb0>

2. For x86

I tested with LLVM 19.1.6 and the latest mainline LLVM, the test result
is same with LoongArch.

(1) objdump info with LLVM release version 19.1.6:

$ clang --version | head -1
clang version 19.1.6

There is an jump instruction "jmp" at the end of dc_fixpt_recip(), it
maybe jumps to a position and then steps to the return instruction, so
there is no "falls through" objtool warning.

0000000000000290 <dc_fixpt_recip>:
     290:       f3 0f 1e fa             endbr64
     294:       e8 00 00 00 00          call   299 <dc_fixpt_recip+0x9>
     299:       48 85 ff                test   %rdi,%rdi
     29c:       0f 84 a8 00 00 00       je     34a <dc_fixpt_recip+0xba>
     2a2:       48 89 f9                mov    %rdi,%rcx
     2a5:       48 f7 d9                neg    %rcx
     2a8:       48 0f 48 cf             cmovs  %rdi,%rcx
     2ac:       53                      push   %rbx
     2ad:       48 b8 00 00 00 00 01    movabs $0x100000000,%rax
     2b4:       00 00 00
     2b7:       31 d2                   xor    %edx,%edx
     2b9:       48 f7 f1                div    %rcx
     2bc:       be e0 ff ff ff          mov    $0xffffffe0,%esi
     2c1:       eb 1b                   jmp    2de <dc_fixpt_recip+0x4e>
     2c3:       45 88 c8                mov    %r9b,%r8b
     2c6:       4d 01 c0                add    %r8,%r8
     2c9:       4d 8d 04 80             lea    (%r8,%rax,4),%r8
     2cd:       48 29 da                sub    %rbx,%rdx
     2d0:       45 88 da                mov    %r11b,%r10b
     2d3:       4c 89 d0                mov    %r10,%rax
     2d6:       4c 09 c0                or     %r8,%rax
     2d9:       83 c6 02                add    $0x2,%esi
     2dc:       74 31                   je     30f <dc_fixpt_recip+0x7f>
     2de:       48 01 d2                add    %rdx,%rdx
     2e1:       45 31 c0                xor    %r8d,%r8d
     2e4:       41 ba 00 00 00 00       mov    $0x0,%r10d
     2ea:       48 39 ca                cmp    %rcx,%rdx
     2ed:       41 0f 93 c1             setae  %r9b
     2f1:       72 03                   jb     2f6 <dc_fixpt_recip+0x66>
     2f3:       49 89 ca                mov    %rcx,%r10
     2f6:       4c 29 d2                sub    %r10,%rdx
     2f9:       48 01 d2                add    %rdx,%rdx
     2fc:       45 31 d2                xor    %r10d,%r10d
     2ff:       48 89 cb                mov    %rcx,%rbx
     302:       48 39 ca                cmp    %rcx,%rdx
     305:       41 0f 93 c3             setae  %r11b
     309:       73 b8                   jae    2c3 <dc_fixpt_recip+0x33>
     30b:       31 db                   xor    %ebx,%ebx
     30d:       eb b4                   jmp    2c3 <dc_fixpt_recip+0x33>
     30f:       48 01 d2                add    %rdx,%rdx
     312:       48 be fe ff ff ff ff    movabs $0x7ffffffffffffffe,%rsi
     319:       ff ff 7f
     31c:       4c 8d 46 01             lea    0x1(%rsi),%r8
     320:       48 39 ca                cmp    %rcx,%rdx
     323:       4c 0f 43 c6             cmovae %rsi,%r8
     327:       4c 39 c0                cmp    %r8,%rax
     32a:       77 29                   ja     355 <dc_fixpt_recip+0xc5>
     32c:       48 39 ca                cmp    %rcx,%rdx
     32f:       48 83 d8 ff             sbb    $0xffffffffffffffff,%rax
     333:       48 89 c1                mov    %rax,%rcx
     336:       48 f7 d9                neg    %rcx
     339:       48 85 ff                test   %rdi,%rdi
     33c:       48 0f 49 c8             cmovns %rax,%rcx
     340:       48 89 c8                mov    %rcx,%rax
     343:       5b                      pop    %rbx
     344:       2e e9 00 00 00 00       cs jmp 34a <dc_fixpt_recip+0xba>
     34a:       0f 0b                   ud2
     34c:       0f 0b                   ud2
     34e:       31 c9                   xor    %ecx,%ecx
     350:       e9 57 ff ff ff          jmp    2ac <dc_fixpt_recip+0x1c>
     355:       0f 0b                   ud2
     357:       eb d3                   jmp    32c <dc_fixpt_recip+0x9c>
     359:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)

0000000000000360 <.Ltmp40>:
     360:       90                      nop
     361:       90                      nop
     362:       90                      nop
     363:       90                      nop
     364:       90                      nop
     365:       90                      nop
     366:       90                      nop
     367:       90                      nop
     368:       90                      nop
     369:       90                      nop
     36a:       90                      nop
     36b:       90                      nop
     36c:       90                      nop
     36d:       90                      nop
     36e:       90                      nop
     36f:       90                      nop

0000000000000370 <dc_fixpt_sinc>:
     370:       f3 0f 1e fa             endbr64
     374:       e8 00 00 00 00          call   379 <dc_fixpt_sinc+0x9>

(2) objdump info with LLVM mainline version 20.0.0git:

$ clang --version | head -1
clang version 20.0.0git (https://github.com/llvm/llvm-project.git 8daf4f16fa08b5d876e98108721dd1743a360326)

There is "ud2" instruction at the end of dc_fixpt_recip(), its offset
is "350", this "ud2" instruction is not dead end due to the offset "352"
is in the relocation section '.rela.discard.reachable', that is to say,
dc_fixpt_recip() doesn't end with a return instruction or an
unconditional jump, so objtool determined that the function can fall
through into the next function, thus there is "falls through" objtool
warning.

0000000000000290 <dc_fixpt_recip>:
     290:       f3 0f 1e fa             endbr64
     294:       e8 00 00 00 00          call   299 <dc_fixpt_recip+0x9>
     299:       48 85 ff                test   %rdi,%rdi
     29c:       0f 84 ac 00 00 00       je     34e <dc_fixpt_recip+0xbe>
     2a2:       53                      push   %rbx
     2a3:       48 89 f9                mov    %rdi,%rcx
     2a6:       48 f7 d9                neg    %rcx
     2a9:       48 0f 48 cf             cmovs  %rdi,%rcx
     2ad:       48 b8 00 00 00 00 01    movabs $0x100000000,%rax
     2b4:       00 00 00
     2b7:       31 d2                   xor    %edx,%edx
     2b9:       48 f7 f1                div    %rcx
     2bc:       be e0 ff ff ff          mov    $0xffffffe0,%esi
     2c1:       eb 1b                   jmp    2de <dc_fixpt_recip+0x4e>
     2c3:       45 88 c8                mov    %r9b,%r8b
     2c6:       4d 01 c0                add    %r8,%r8
     2c9:       4d 8d 04 80             lea    (%r8,%rax,4),%r8
     2cd:       48 29 da                sub    %rbx,%rdx
     2d0:       45 88 da                mov    %r11b,%r10b
     2d3:       4c 89 d0                mov    %r10,%rax
     2d6:       4c 09 c0                or     %r8,%rax
     2d9:       83 c6 02                add    $0x2,%esi
     2dc:       74 31                   je     30f <dc_fixpt_recip+0x7f>
     2de:       48 01 d2                add    %rdx,%rdx
     2e1:       45 31 c0                xor    %r8d,%r8d
     2e4:       41 ba 00 00 00 00       mov    $0x0,%r10d
     2ea:       48 39 ca                cmp    %rcx,%rdx
     2ed:       41 0f 93 c1             setae  %r9b
     2f1:       72 03                   jb     2f6 <dc_fixpt_recip+0x66>
     2f3:       49 89 ca                mov    %rcx,%r10
     2f6:       4c 29 d2                sub    %r10,%rdx
     2f9:       48 01 d2                add    %rdx,%rdx
     2fc:       45 31 d2                xor    %r10d,%r10d
     2ff:       48 89 cb                mov    %rcx,%rbx
     302:       48 39 ca                cmp    %rcx,%rdx
     305:       41 0f 93 c3             setae  %r11b
     309:       73 b8                   jae    2c3 <dc_fixpt_recip+0x33>
     30b:       31 db                   xor    %ebx,%ebx
     30d:       eb b4                   jmp    2c3 <dc_fixpt_recip+0x33>
     30f:       48 01 d2                add    %rdx,%rdx
     312:       48 be fe ff ff ff ff    movabs $0x7ffffffffffffffe,%rsi
     319:       ff ff 7f
     31c:       4c 8d 46 01             lea    0x1(%rsi),%r8
     320:       48 39 ca                cmp    %rcx,%rdx
     323:       4c 0f 43 c6             cmovae %rsi,%r8
     327:       4c 39 c0                cmp    %r8,%rax
     32a:       77 1e                   ja     34a <dc_fixpt_recip+0xba>
     32c:       48 39 ca                cmp    %rcx,%rdx
     32f:       48 83 d8 ff             sbb    $0xffffffffffffffff,%rax
     333:       48 89 c1                mov    %rax,%rcx
     336:       48 f7 d9                neg    %rcx
     339:       48 85 ff                test   %rdi,%rdi
     33c:       48 0f 49 c8             cmovns %rax,%rcx
     340:       48 89 c8                mov    %rcx,%rax
     343:       5b                      pop    %rbx
     344:       2e e9 00 00 00 00       cs jmp 34a <dc_fixpt_recip+0xba>
     34a:       0f 0b                   ud2
     34c:       eb de                   jmp    32c <dc_fixpt_recip+0x9c>
     34e:       0f 0b                   ud2
     350:       0f 0b                   ud2
352: 66 66 66 66 66 2e 0f data16 data16 data16 data16 cs nopw 0x0(%rax,%rax,1)
     359:       1f 84 00 00 00 00 00

0000000000000360 <.Ltmp40>:
     360:       90                      nop
     361:       90                      nop
     362:       90                      nop
     363:       90                      nop
     364:       90                      nop
     365:       90                      nop
     366:       90                      nop
     367:       90                      nop
     368:       90                      nop
     369:       90                      nop
     36a:       90                      nop
     36b:       90                      nop
     36c:       90                      nop
     36d:       90                      nop
     36e:       90                      nop
     36f:       90                      nop

0000000000000370 <dc_fixpt_sinc>:
     370:       f3 0f 1e fa             endbr64
     374:       e8 00 00 00 00          call   379 <dc_fixpt_sinc+0x9>
--------------------------------------------------------------------------

In my opinion, if there is a bug, then it is a generic rather than
arch-specific bug.

The root cause is related with LLVM or Linux kernel? I am not sure
because objtool only checks the object info and reports the warning
"falls through" if the check conditions are true.

tools/objtool/check.c
static int validate_branch(struct objtool_file *file, struct symbol *func,
struct instruction *insn, struct insn_state state)
{
        struct alternative *alt;
        struct instruction *next_insn, *prev_insn = NULL;
        struct section *sec;
        u8 visited;
        int ret;

        sec = insn->sec;

        while (1) {
                next_insn = next_insn_to_validate(file, insn);

if (func && insn_func(insn) && func != insn_func(insn)->pfunc) { /* Ignore KCFI type preambles, which always fall through */
                        if (!strncmp(func->name, "__cfi_", 6) ||
                            !strncmp(func->name, "__pfx_", 6))
                                return 0;

                        WARN("%s() falls through to next function %s()",
                             func->name, insn_func(insn)->name);
                        return 1;
                }
        ...
}

How to silence the warning when compiling with the latest mainline
LLVM with a proper way? Modify LLVM code or tools/objtool/check.c
or drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c?

Anyway, I think this patch can be independent of this series and can
be sent separately after the "real bug" is fixed, please ignore it now.

Thanks,
Tiezhu




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux