Re: [PATCH 5.10] overflow.h: use new generic division helpers to avoid / operator

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 14, 2021 at 11:55 AM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> Btw, these kinds of issues is exactly why I've been hardnosed about
> 64-bit divides for decades. 64-bit divides on 32-bit machines are
> *expensive*. It's why I don't like saying "just use '/' and we'll pick
> up the routines from libgcc".

I was going to ask about the history there; not to derail the thread
further, but this is a question whose answer is important to me.

Are the helpers from libgcc insufficient?  Working through
https://github.com/ClangBuiltLinux/linux/issues/1438 which all came
about because LLVM's equivalent of libgcc, "compiler-rt," had a nice
helper for builtin multiply with overflow check that libgcc does not.
As such, llvm cannot assume compiler-rt is being linked against, so
llvm must expand these inline every time.  And the code in line is
HUGE: https://godbolt.org/z/MM4hPGxTE.  IMO we could do a much much
better job on code size (and thus probably I$ performance
improvements) had we just linked against the compiler runtime.

Perhaps the concern is of the quality of implementations of the
compiler runtime routines; that we may have arch specific
implementations that are better? 64b division on 32b targets is
expensive either way; I'd rather have the compiler generate a libcall
than try to expand these inline.  I'm not sure if it's the case, but I
can't help but wonder if there are other optimization decisions being
based on whether the compiler runtime is being linked against or not;
it's hard for the compiler to know what will happen at link time.
Vaguely reminiscent of the issues we face against using
-ffreestanding.

Switching that now (so that we did link in the compiler runtimes)
would be a massive yak shave, for sure.

> In almost all real-life cases - at least in a kernel - the full divide
> is unnecessary. It's almost always people being silly and lazy, and
> the very expensive operation can be avoided entirely (or at least
> minimized to something like 64/32).

At least when dealing in powers of two, sure.
-- 
Thanks,
~Nick Desaulniers



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux