On Tue, Sep 10, 2024 at 2:53 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Tue, Sep 10, 2024 at 12:32 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > > > > > > On 9/10/24 11:25 AM, Alexei Starovoitov wrote: > > > On Tue, Sep 10, 2024 at 11:02 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > > >> > > >> On 9/10/24 8:21 AM, Alexei Starovoitov wrote: > > >>> On Tue, Sep 10, 2024 at 7:21 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > > >>>> On 9/9/24 10:29 AM, Alexei Starovoitov wrote: > > >>>>> On Mon, Sep 9, 2024 at 10:21 AM Zac Ecob <zacecob@xxxxxxxxxxxxxx> wrote: > > >>>>>> Hello, > > >>>>>> > > >>>>>> I recently received a kernel 'oops' about a divide error. > > >>>>>> After some research, it seems that the 'div64_s64' function used for the 'MOD'/'REM' instructions boils down to an 'idiv'. > > >>>>>> > > >>>>>> The 'dividend' is set to INT64_MIN, and the 'divisor' to -1, then because of two's complement, there is no corresponding positive value, causing the error (at least to my understanding). > > >>>>>> > > >>>>>> > > >>>>>> Apologies if this is already known / not a relevant concern. > > >>>>> Thanks for the report. This is a new issue. > > >>>>> > > >>>>> Yonghong, > > >>>>> > > >>>>> it's related to the new signed div insn. > > >>>>> It sounds like we need to update chk_and_div[] part of > > >>>>> the verifier to account for signed div differently. > > >>>> In verifier, we have > > >>>> /* [R,W]x div 0 -> 0 */ > > >>>> /* [R,W]x mod 0 -> [R,W]x */ > > >>> the verifier is doing what hw does. In this case this is arm64 behavior. > > >> Okay, I see. I tried on a arm64 machine it indeed hehaves like the above. > > >> > > >> # uname -a > > >> Linux ... #1 SMP PREEMPT_DYNAMIC Thu Aug 1 06:58:32 PDT 2024 aarch64 aarch64 aarch64 GNU/Linux > > >> # cat t2.c > > >> #include <stdio.h> > > >> #include <limits.h> > > >> int main(void) { > > >> volatile long long a = 5; > > >> volatile long long b = 0; > > >> printf("a/b = %lld\n", a/b); > > >> return 0; > > >> } > > >> # cat t3.c > > >> #include <stdio.h> > > >> #include <limits.h> > > >> int main(void) { > > >> volatile long long a = 5; > > >> volatile long long b = 0; > > >> printf("a%%b = %lld\n", a%b); > > >> return 0; > > >> } > > >> # gcc -O2 t2.c && ./a.out > > >> a/b = 0 > > >> # gcc -O2 t3.c && ./a.out > > >> a%b = 5 > > >> > > >> on arm64, clang18 compiled binary has the same result > > >> > > >> # clang -O2 t2.c && ./a.out > > >> a/b = 0 > > >> # clang -O2 t3.c && ./a.out > > >> a%b = 5 > > >> > > >> The same source code, compiled on x86_64 with -O2 as well, > > >> it generates: > > >> Floating point exception (core dumped) > > >> > > >>>> What the value for > > >>>> Rx_a sdiv Rx_b -> ? > > >>>> where Rx_a = INT64_MIN and Rx_b = -1? > > >>> Why does it matter what Rx_a contains ? > > >> It does matter. See below: > > >> > > >> on arm64: > > >> > > >> # cat t1.c > > >> #include <stdio.h> > > >> #include <limits.h> > > >> int main(void) { > > >> volatile long long a = LLONG_MIN; > > >> volatile long long b = -1; > > >> printf("a/b = %lld\n", a/b); > > >> return 0; > > >> } > > >> # clang -O2 t1.c && ./a.out > > >> a/b = -9223372036854775808 > > >> # gcc -O2 t1.c && ./a.out > > >> a/b = -9223372036854775808 > > >> > > >> So the result of a/b is LLONG_MIN > > >> > > >> The same code will cause exception on x86_64: > > >> > > >> $ uname -a > > >> Linux ... #1 SMP Wed Jun 5 06:21:21 PDT 2024 x86_64 x86_64 x86_64 GNU/Linux > > >> [yhs@devvm1513.prn0 ~]$ gcc -O2 t1.c && ./a.out > > >> Floating point exception (core dumped) > > >> [yhs@devvm1513.prn0 ~]$ clang -O2 t1.c && ./a.out > > >> Floating point exception (core dumped) > > >> > > >> So this is what we care about. > > >> > > >> So I guess we can follow arm64 result too. > > >> > > >>> What cpus do in this case? > > >> See above. arm64 produces *some* result while x64 cause exception. > > >> We do need to special handle for LLONG_MIN/(-1) case. > > > My point about Rx_a that idiv will cause out-of-range exception > > > for many other values than Rx_a == INT64_MIN. > > > I'm not sure that divisor -1 is the only such case either. > > > Probably is, since intuitively -2 and all other divisors should fit fine. > > > So the check likely needs Rx_b == -1 and a check for high bit in Rx_a ? > > > > Looks like only Rx_a == INT64_MIN may cause the problem. > > All other Rx_a numbers (from INT64_MIN+1 to INT64_MAX) > > should be okay. Some selective testing below on x64 host: > > > > $ cat t5.c > > #include <stdio.h> > > #include <limits.h> > > > > unsigned long long res; > > int main(void) { > > volatile long long a; > > long long i; > > for (i = LLONG_MIN + 1; i <= LLONG_MIN + 100; i++) { > > volatile long long b = -1; > > a = i; > > res += (unsigned long long)(a/b); > > } > > for (i = LLONG_MAX - 100; i <= LLONG_MAX - 1; i++) { > > Changing this test to i <= LLONG_MAX > and compiling with gcc -O0 or clang -O2 or clang -O0 > is causing an exception, > because 'a' becomes LLONG_MIN. > Compilers are doing some odd code gen. > I don't understand how 'i' can wrap this way. > > > volatile long long b = -1; > > a = i; > > res += (unsigned long long)(a/b); > > } > > printf("res = %llx\n", res); > > return 0; > > } > > $ gcc -O2 t5.c && ./a.out > > res = 64 > > > > So I think it should be okay if the range is from LLONG_MIN + 1 > > to LLONG_MAX - 1. > > > > Now for LLONG_MAX/(-1) > > > > $ cat t6.c > > #include <stdio.h> > > #include <limits.h> > > int main(void) { > > volatile long long a = LLONG_MAX; > > volatile long long b = -1; > > printf("a/b = %lld\n", a/b); > > return 0; > > } > > $ gcc -O2 t6.c && ./a.out > > a/b = -9223372036854775807 > > > > It is okay too. So I think LLONG_MIN/(-1) is the only case > > we should take care of. > > The test shows that that's the case, but I still can wrap > my head around that only LLONG_MIN/(-1) is a problem. > > Any math experts can explain this? > Not a math expert, but this is because LLONG_MIN / (-1) needs to be -LLONG_MIN, right? But -LLONG_MIN is not representable in 2-complement representation, because positive and negative sides are not "symmetrical": LLONG_MIN = -9,223,372,036,854,775,808 LLONG_MAX= 9,223,372,036,854,775,807 -LLONG_MIN would be 9,223,372,036,854,775,808, which is beyond the representable range for 64-bit signed integer. That's why Dave asked about BPF_NEG for LLONG_MIN, it's a similar problem, its result is unrepresentable value. So in practice -LLONG_MIN == LLONG_MIN :) $ cat main.c #include <stdio.h> #include <stdint.h> int main() { long long x = INT64_MIN; printf("%lld %llx %llx\n", x, x, -x); return 0; } $ cc main.c && ./a.out -9223372036854775808 8000000000000000 8000000000000000