Re: [PATCH bpf-next v2 1/2] bpf: Fix a sdiv overflow issue

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Thu, 12 Sep 2024 11:19:46 -0700

On Thu, Sep 12, 2024 at 11:17 AM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
>
> On Wed, Sep 11, 2024 at 9:00 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
> >
> > Zac Ecob reported a problem where a bpf program may cause kernel crash due
> > to the following error:
> >   Oops: divide error: 0000 [#1] PREEMPT SMP KASAN PTI
> >
> > The failure is due to the below signed divide:
> >   LLONG_MIN/-1 where LLONG_MIN equals to -9,223,372,036,854,775,808.
> > LLONG_MIN/-1 is supposed to give a positive number 9,223,372,036,854,775,808,
> > but it is impossible since for 64-bit system, the maximum positive
> > number is 9,223,372,036,854,775,807. On x86_64, LLONG_MIN/-1 will
> > cause a kernel exception. On arm64, the result for LLONG_MIN/-1 is
> > LLONG_MIN.
> >
> > Further investigation found all the following sdiv/smod cases may trigger
> > an exception when bpf program is running on x86_64 platform:
> >   - LLONG_MIN/-1 for 64bit operation
> >   - INT_MIN/-1 for 32bit operation
> >   - LLONG_MIN%-1 for 64bit operation
> >   - INT_MIN%-1 for 32bit operation
> > where -1 can be an immediate or in a register.
> >
> > On arm64, there are no exceptions:
> >   - LLONG_MIN/-1 = LLONG_MIN
> >   - INT_MIN/-1 = INT_MIN
> >   - LLONG_MIN%-1 = 0
> >   - INT_MIN%-1 = 0
> > where -1 can be an immediate or in a register.
> >
> > Insn patching is needed to handle the above cases and the patched codes
> > produced results aligned with above arm64 result.
> >
> >   [1] https://lore.kernel.org/bpf/tPJLTEh7S_DxFEqAI2Ji5MBSoZVg7_G-Py2iaZpAaWtM961fFTWtsnlzwvTbzBzaUzwQAoNATXKUlt0LZOFgnDcIyKCswAnAGdUF3LBrhGQ=@protonmail.com/
> >
> > Reported-by: Zac Ecob <zacecob@xxxxxxxxxxxxxx>
> > Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx>
> > ---
> >  kernel/bpf/verifier.c | 84 ++++++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 80 insertions(+), 4 deletions(-)
> >
> > Changelogs:
> >   v1 -> v2:
> >     - Handle more crash cases like 32bit operation and modules.
> >     - Add more tests to test new cases.
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index f35b80c16cda..ad7f51302c70 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -20499,13 +20499,46 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
> >                         /* Convert BPF_CLASS(insn->code) == BPF_ALU64 to 32-bit ALU */
> >                         insn->code = BPF_ALU | BPF_OP(insn->code) | BPF_SRC(insn->code);
> >
> > -               /* Make divide-by-zero exceptions impossible. */
> > +               /* Make sdiv/smod divide-by-minus-one exceptions impossible. */
> > +               if ((insn->code == (BPF_ALU64 | BPF_MOD | BPF_K) ||
> > +                    insn->code == (BPF_ALU64 | BPF_DIV | BPF_K) ||
> > +                    insn->code == (BPF_ALU | BPF_MOD | BPF_K) ||
> > +                    insn->code == (BPF_ALU | BPF_DIV | BPF_K)) &&
> > +                   insn->off == 1 && insn->imm == -1) {
> > +                       bool is64 = BPF_CLASS(insn->code) == BPF_ALU64;
> > +                       bool isdiv = BPF_OP(insn->code) == BPF_DIV;
> > +                       struct bpf_insn *patchlet;
> > +                       struct bpf_insn chk_and_div[] = {
> > +                               BPF_RAW_INSN((is64 ? BPF_ALU64 : BPF_ALU) |
> > +                                            BPF_OP(BPF_NEG) | BPF_K, insn->dst_reg,
> > +                                            0, 0, 0),
> > +                       };
> > +                       struct bpf_insn chk_and_mod[] = {
> > +                               BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg),
> > +                       };
> > +
> > +                       patchlet = isdiv ? chk_and_div : chk_and_mod;
>
> nit: "chk_and_" part in the name is misleading, it's more like
> "safe_div" and "safe_mod". Oh, and it's "sdiv" and "smod" specific, so
> probably not a bad idea to have that in the name as well.
>
> > +                       cnt = isdiv ? ARRAY_SIZE(chk_and_div) : ARRAY_SIZE(chk_and_mod);
> > +
> > +                       new_prog = bpf_patch_insn_data(env, i + delta, patchlet, cnt);
> > +                       if (!new_prog)
> > +                               return -ENOMEM;
> > +
> > +                       delta    += cnt - 1;
> > +                       env->prog = prog = new_prog;
> > +                       insn      = new_prog->insnsi + i + delta;
> > +                       goto next_insn;
> > +               }
> > +
> > +               /* Make divide-by-zero and divide-by-minus-one exceptions impossible. */
> >                 if (insn->code == (BPF_ALU64 | BPF_MOD | BPF_X) ||
> >                     insn->code == (BPF_ALU64 | BPF_DIV | BPF_X) ||
> >                     insn->code == (BPF_ALU | BPF_MOD | BPF_X) ||
> >                     insn->code == (BPF_ALU | BPF_DIV | BPF_X)) {
> >                         bool is64 = BPF_CLASS(insn->code) == BPF_ALU64;
> >                         bool isdiv = BPF_OP(insn->code) == BPF_DIV;
> > +                       bool is_sdiv = isdiv && insn->off == 1;
> > +                       bool is_smod = !isdiv && insn->off == 1;
> >                         struct bpf_insn *patchlet;
> >                         struct bpf_insn chk_and_div[] = {
> >                                 /* [R,W]x div 0 -> 0 */
> > @@ -20525,10 +20558,53 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
> >                                 BPF_JMP_IMM(BPF_JA, 0, 0, 1),
> >                                 BPF_MOV32_REG(insn->dst_reg, insn->dst_reg),
> >                         };
> > +                       struct bpf_insn chk_and_sdiv[] = {
> > +                               /* [R,W]x sdiv 0 -> 0 */
> > +                               BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
> > +                                            BPF_JNE | BPF_K, insn->src_reg,
> > +                                            0, 2, 0),
> > +                               BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg),
> > +                               BPF_JMP_IMM(BPF_JA, 0, 0, 4),
> > +                               /* LLONG_MIN sdiv -1 -> LLONG_MIN
> > +                                * INT_MIN sdiv -1 -> INT_MIN
> > +                                */
> > +                               BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
> > +                                            BPF_JNE | BPF_K, insn->src_reg,
> > +                                            0, 2, -1),
> > +                               /* BPF_NEG(LLONG_MIN) == -LLONG_MIN == LLONG_MIN */
> > +                               BPF_RAW_INSN((is64 ? BPF_ALU64 : BPF_ALU) |
> > +                                            BPF_OP(BPF_NEG) | BPF_K, insn->dst_reg,
> > +                                            0, 0, 0),
> > +                               BPF_JMP_IMM(BPF_JA, 0, 0, 1),
>
> I don't know how much it actually matters, but it feels like common
> safe case should be as straight-line-executed as possible, no?
>
> So maybe it's better to rearrange to roughly this (where rX is the
> divisor register):
>
>     if rX == 0 goto L1
>     if rX == -1 goto L2
>     rY /= rX
>     goto L3
> L1: /* zero case */
>     rY = 0 /* fallthrough, negation doesn't hurt, but less jumping */
> L2: /* negative one case (or zero) */
>     rY = -rY
> L3:
>     ... the rest of the program code ...
>
>
> Those two branches for common case are still annoyingly inefficient, I
> wonder if we should do
>
>     rX += 1 /* [-1, 0] -> [0, 1]
>     if rX <=(unsigned) 1 goto L1
>     rX -= 1 /* restore original divisor */
>     rY /= rX /* common case */
>     goto L3
> L1:
>     if rX == 0 goto L2 /* jump if originally -1 */
>     rY = 0 /* division by zero case */
> L2: /* fallthrough */
>     rY = -rY
>     rX -= 1 /* restore original divisor */
> L3:
>     ... continue with the rest ...

hmm.. just in case rX is the same register as rY, probably best to
restore rX early right at L1: label (and adjust `if rX == 0 goto L2`
into `if rX != 0 goto L2`).

>
>
> It's a bit trickier to follow, but should be faster in a common case.
>
> WDYT? Too much too far?
>
>
> > +                               *insn,
> > +                       };
> > +                       struct bpf_insn chk_and_smod[] = {
> > +                               /* [R,W]x mod 0 -> [R,W]x */
> > +                               BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
> > +                                            BPF_JNE | BPF_K, insn->src_reg,
> > +                                            0, 2, 0),
> > +                               BPF_MOV32_REG(insn->dst_reg, insn->dst_reg),
> > +                               BPF_JMP_IMM(BPF_JA, 0, 0, 4),
> > +                               /* [R,W]x mod -1 -> 0 */
> > +                               BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
> > +                                            BPF_JNE | BPF_K, insn->src_reg,
> > +                                            0, 2, -1),
> > +                               BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg),
> > +                               BPF_JMP_IMM(BPF_JA, 0, 0, 1),
> > +                               *insn,
> > +                       };
> >
>
> Same idea here, keep the common case as straight as possible.
>
> > -                       patchlet = isdiv ? chk_and_div : chk_and_mod;
> > -                       cnt = isdiv ? ARRAY_SIZE(chk_and_div) :
> > -                                     ARRAY_SIZE(chk_and_mod) - (is64 ? 2 : 0);
> > +                       if (is_sdiv) {
> > +                               patchlet = chk_and_sdiv;
> > +                               cnt = ARRAY_SIZE(chk_and_sdiv);
> > +                       } else if (is_smod) {
> > +                               patchlet = chk_and_smod;
> > +                               cnt = ARRAY_SIZE(chk_and_smod);
> > +                       } else {
> > +                               patchlet = isdiv ? chk_and_div : chk_and_mod;
> > +                               cnt = isdiv ? ARRAY_SIZE(chk_and_div) :
> > +                                             ARRAY_SIZE(chk_and_mod) - (is64 ? 2 : 0);
> > +                       }
> >
> >                         new_prog = bpf_patch_insn_data(env, i + delta, patchlet, cnt);
> >                         if (!new_prog)
> > --
> > 2.43.5
> >