On Fri, Dec 20, 2024 at 01:02:18PM +0800, Tiezhu Yang wrote: > 2. For x86 > > I tested with LLVM 19.1.6 and the latest mainline LLVM, the test result > is same with LoongArch. Debian's clang-19 is 19.1.5. > (1) objdump info with LLVM release version 19.1.6: Please always use -r, that's ever so much more readable. > $ clang --version | head -1 > clang version 19.1.6 > > There is an jump instruction "jmp" at the end of dc_fixpt_recip(), it > maybe jumps to a position and then steps to the return instruction, so > there is no "falls through" objtool warning. > > 0000000000000290 <dc_fixpt_recip>: > 290: f3 0f 1e fa endbr64 > 294: e8 00 00 00 00 call 299 <dc_fixpt_recip+0x9> > 299: 48 85 ff test %rdi,%rdi > 29c: 0f 84 a8 00 00 00 je 34a <dc_fixpt_recip+0xba> > 2a2: 48 89 f9 mov %rdi,%rcx > 2a5: 48 f7 d9 neg %rcx > 2a8: 48 0f 48 cf cmovs %rdi,%rcx > 2ac: 53 push %rbx > 2ad: 48 b8 00 00 00 00 01 movabs $0x100000000,%rax > 2b4: 00 00 00 > 2b7: 31 d2 xor %edx,%edx > 2b9: 48 f7 f1 div %rcx > 2bc: be e0 ff ff ff mov $0xffffffe0,%esi > 2c1: eb 1b jmp 2de <dc_fixpt_recip+0x4e> > 2c3: 45 88 c8 mov %r9b,%r8b > 2c6: 4d 01 c0 add %r8,%r8 > 2c9: 4d 8d 04 80 lea (%r8,%rax,4),%r8 > 2cd: 48 29 da sub %rbx,%rdx > 2d0: 45 88 da mov %r11b,%r10b > 2d3: 4c 89 d0 mov %r10,%rax > 2d6: 4c 09 c0 or %r8,%rax > 2d9: 83 c6 02 add $0x2,%esi > 2dc: 74 31 je 30f <dc_fixpt_recip+0x7f> > 2de: 48 01 d2 add %rdx,%rdx > 2e1: 45 31 c0 xor %r8d,%r8d > 2e4: 41 ba 00 00 00 00 mov $0x0,%r10d > 2ea: 48 39 ca cmp %rcx,%rdx > 2ed: 41 0f 93 c1 setae %r9b > 2f1: 72 03 jb 2f6 <dc_fixpt_recip+0x66> > 2f3: 49 89 ca mov %rcx,%r10 > 2f6: 4c 29 d2 sub %r10,%rdx > 2f9: 48 01 d2 add %rdx,%rdx > 2fc: 45 31 d2 xor %r10d,%r10d > 2ff: 48 89 cb mov %rcx,%rbx > 302: 48 39 ca cmp %rcx,%rdx > 305: 41 0f 93 c3 setae %r11b > 309: 73 b8 jae 2c3 <dc_fixpt_recip+0x33> > 30b: 31 db xor %ebx,%ebx > 30d: eb b4 jmp 2c3 <dc_fixpt_recip+0x33> > 30f: 48 01 d2 add %rdx,%rdx > 312: 48 be fe ff ff ff ff movabs $0x7ffffffffffffffe,%rsi > 319: ff ff 7f > 31c: 4c 8d 46 01 lea 0x1(%rsi),%r8 > 320: 48 39 ca cmp %rcx,%rdx > 323: 4c 0f 43 c6 cmovae %rsi,%r8 > 327: 4c 39 c0 cmp %r8,%rax > 32a: 77 29 ja 355 <dc_fixpt_recip+0xc5> > 32c: 48 39 ca cmp %rcx,%rdx > 32f: 48 83 d8 ff sbb $0xffffffffffffffff,%rax > 333: 48 89 c1 mov %rax,%rcx > 336: 48 f7 d9 neg %rcx > 339: 48 85 ff test %rdi,%rdi > 33c: 48 0f 49 c8 cmovns %rax,%rcx > 340: 48 89 c8 mov %rcx,%rax > 343: 5b pop %rbx > 344: 2e e9 00 00 00 00 cs jmp 34a <dc_fixpt_recip+0xba> > 34a: 0f 0b ud2 > 34c: 0f 0b ud2 > 34e: 31 c9 xor %ecx,%ecx > 350: e9 57 ff ff ff jmp 2ac <dc_fixpt_recip+0x1c> > 355: 0f 0b ud2 > 357: eb d3 jmp 32c <dc_fixpt_recip+0x9c> > 359: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) I had to puzzle a bit to get that double ud2 -- not all configs do that. Also, curse the DRM Makefiles, you can't do: make drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.s :-( They are the two ASSERT()s on line 217 and 54 respectively. They end up asserting the same value twice, so that makes sense. > $ clang --version | head -1 > clang version 20.0.0git (https://github.com/llvm/llvm-project.git > 8daf4f16fa08b5d876e98108721dd1743a360326) So I didn't have a recent build at hand.. so I've not validated the below. > There is "ud2" instruction at the end of dc_fixpt_recip(), its offset > is "350", this "ud2" instruction is not dead end due to the offset "352" > is in the relocation section '.rela.discard.reachable', that is to say, > dc_fixpt_recip() doesn't end with a return instruction or an > unconditional jump, so objtool determined that the function can fall > through into the next function, thus there is "falls through" objtool > warning. > > 0000000000000290 <dc_fixpt_recip>: > 290: f3 0f 1e fa endbr64 > 294: e8 00 00 00 00 call 299 <dc_fixpt_recip+0x9> > 299: 48 85 ff test %rdi,%rdi > 29c: 0f 84 ac 00 00 00 je 34e <dc_fixpt_recip+0xbe> > 2a2: 53 push %rbx > 2a3: 48 89 f9 mov %rdi,%rcx > 2a6: 48 f7 d9 neg %rcx > 2a9: 48 0f 48 cf cmovs %rdi,%rcx > 2ad: 48 b8 00 00 00 00 01 movabs $0x100000000,%rax > 2b4: 00 00 00 > 2b7: 31 d2 xor %edx,%edx > 2b9: 48 f7 f1 div %rcx > 2bc: be e0 ff ff ff mov $0xffffffe0,%esi > 2c1: eb 1b jmp 2de <dc_fixpt_recip+0x4e> > 2c3: 45 88 c8 mov %r9b,%r8b > 2c6: 4d 01 c0 add %r8,%r8 > 2c9: 4d 8d 04 80 lea (%r8,%rax,4),%r8 > 2cd: 48 29 da sub %rbx,%rdx > 2d0: 45 88 da mov %r11b,%r10b > 2d3: 4c 89 d0 mov %r10,%rax > 2d6: 4c 09 c0 or %r8,%rax > 2d9: 83 c6 02 add $0x2,%esi > 2dc: 74 31 je 30f <dc_fixpt_recip+0x7f> > 2de: 48 01 d2 add %rdx,%rdx > 2e1: 45 31 c0 xor %r8d,%r8d > 2e4: 41 ba 00 00 00 00 mov $0x0,%r10d > 2ea: 48 39 ca cmp %rcx,%rdx > 2ed: 41 0f 93 c1 setae %r9b > 2f1: 72 03 jb 2f6 <dc_fixpt_recip+0x66> > 2f3: 49 89 ca mov %rcx,%r10 > 2f6: 4c 29 d2 sub %r10,%rdx > 2f9: 48 01 d2 add %rdx,%rdx > 2fc: 45 31 d2 xor %r10d,%r10d > 2ff: 48 89 cb mov %rcx,%rbx > 302: 48 39 ca cmp %rcx,%rdx > 305: 41 0f 93 c3 setae %r11b > 309: 73 b8 jae 2c3 <dc_fixpt_recip+0x33> > 30b: 31 db xor %ebx,%ebx > 30d: eb b4 jmp 2c3 <dc_fixpt_recip+0x33> > 30f: 48 01 d2 add %rdx,%rdx > 312: 48 be fe ff ff ff ff movabs $0x7ffffffffffffffe,%rsi > 319: ff ff 7f > 31c: 4c 8d 46 01 lea 0x1(%rsi),%r8 > 320: 48 39 ca cmp %rcx,%rdx > 323: 4c 0f 43 c6 cmovae %rsi,%r8 > 327: 4c 39 c0 cmp %r8,%rax > 32a: 77 1e ja 34a <dc_fixpt_recip+0xba> > 32c: 48 39 ca cmp %rcx,%rdx > 32f: 48 83 d8 ff sbb $0xffffffffffffffff,%rax > 333: 48 89 c1 mov %rax,%rcx > 336: 48 f7 d9 neg %rcx > 339: 48 85 ff test %rdi,%rdi > 33c: 48 0f 49 c8 cmovns %rax,%rcx > 340: 48 89 c8 mov %rcx,%rax > 343: 5b pop %rbx > 344: 2e e9 00 00 00 00 cs jmp 34a <dc_fixpt_recip+0xba> > 34a: 0f 0b ud2 > 34c: eb de jmp 32c <dc_fixpt_recip+0x9c> > 34e: 0f 0b ud2 > 350: 0f 0b ud2 > 352: 66 66 66 66 66 2e 0f data16 data16 data16 data16 cs nopw > 0x0(%rax,%rax,1) > 359: 1f 84 00 00 00 00 00 If you put them size-by-side, you'll see it's more or less the same code-gen (trivial differences), but now it just stops code-gen, where previously it would continue. So this really is a compiler problem, this needs no annotation, it's straight up broken. Now, the thing is, these ASSERT()s are checking for divide-by-zero, I suspect clang figured that out and invokes UB on us and just stops code-gen. Nathan, Nick, don't we have a compiler flag that forces __builtin_trap() whenever clang pulls something like this? I think UBSAN does this, but we really shouldn't pull in the whole of that for sanity.