On Wed, Sep 11, 2024 at 9:00 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > > Zac Ecob reported a problem where a bpf program may cause kernel crash due > to the following error: > Oops: divide error: 0000 [#1] PREEMPT SMP KASAN PTI > > The failure is due to the below signed divide: > LLONG_MIN/-1 where LLONG_MIN equals to -9,223,372,036,854,775,808. > LLONG_MIN/-1 is supposed to give a positive number 9,223,372,036,854,775,808, > but it is impossible since for 64-bit system, the maximum positive > number is 9,223,372,036,854,775,807. On x86_64, LLONG_MIN/-1 will > cause a kernel exception. On arm64, the result for LLONG_MIN/-1 is > LLONG_MIN. > > Further investigation found all the following sdiv/smod cases may trigger > an exception when bpf program is running on x86_64 platform: > - LLONG_MIN/-1 for 64bit operation > - INT_MIN/-1 for 32bit operation > - LLONG_MIN%-1 for 64bit operation > - INT_MIN%-1 for 32bit operation > where -1 can be an immediate or in a register. > > On arm64, there are no exceptions: > - LLONG_MIN/-1 = LLONG_MIN > - INT_MIN/-1 = INT_MIN > - LLONG_MIN%-1 = 0 > - INT_MIN%-1 = 0 > where -1 can be an immediate or in a register. > > Insn patching is needed to handle the above cases and the patched codes > produced results aligned with above arm64 result. > > [1] https://lore.kernel.org/bpf/tPJLTEh7S_DxFEqAI2Ji5MBSoZVg7_G-Py2iaZpAaWtM961fFTWtsnlzwvTbzBzaUzwQAoNATXKUlt0LZOFgnDcIyKCswAnAGdUF3LBrhGQ=@protonmail.com/ > > Reported-by: Zac Ecob <zacecob@xxxxxxxxxxxxxx> > Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx> > --- > kernel/bpf/verifier.c | 84 ++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 80 insertions(+), 4 deletions(-) > > Changelogs: > v1 -> v2: > - Handle more crash cases like 32bit operation and modules. > - Add more tests to test new cases. > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index f35b80c16cda..ad7f51302c70 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -20499,13 +20499,46 @@ static int do_misc_fixups(struct bpf_verifier_env *env) > /* Convert BPF_CLASS(insn->code) == BPF_ALU64 to 32-bit ALU */ > insn->code = BPF_ALU | BPF_OP(insn->code) | BPF_SRC(insn->code); > > - /* Make divide-by-zero exceptions impossible. */ > + /* Make sdiv/smod divide-by-minus-one exceptions impossible. */ > + if ((insn->code == (BPF_ALU64 | BPF_MOD | BPF_K) || > + insn->code == (BPF_ALU64 | BPF_DIV | BPF_K) || > + insn->code == (BPF_ALU | BPF_MOD | BPF_K) || > + insn->code == (BPF_ALU | BPF_DIV | BPF_K)) && > + insn->off == 1 && insn->imm == -1) { > + bool is64 = BPF_CLASS(insn->code) == BPF_ALU64; > + bool isdiv = BPF_OP(insn->code) == BPF_DIV; > + struct bpf_insn *patchlet; > + struct bpf_insn chk_and_div[] = { > + BPF_RAW_INSN((is64 ? BPF_ALU64 : BPF_ALU) | > + BPF_OP(BPF_NEG) | BPF_K, insn->dst_reg, > + 0, 0, 0), > + }; > + struct bpf_insn chk_and_mod[] = { > + BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg), > + }; > + > + patchlet = isdiv ? chk_and_div : chk_and_mod; nit: "chk_and_" part in the name is misleading, it's more like "safe_div" and "safe_mod". Oh, and it's "sdiv" and "smod" specific, so probably not a bad idea to have that in the name as well. > + cnt = isdiv ? ARRAY_SIZE(chk_and_div) : ARRAY_SIZE(chk_and_mod); > + > + new_prog = bpf_patch_insn_data(env, i + delta, patchlet, cnt); > + if (!new_prog) > + return -ENOMEM; > + > + delta += cnt - 1; > + env->prog = prog = new_prog; > + insn = new_prog->insnsi + i + delta; > + goto next_insn; > + } > + > + /* Make divide-by-zero and divide-by-minus-one exceptions impossible. */ > if (insn->code == (BPF_ALU64 | BPF_MOD | BPF_X) || > insn->code == (BPF_ALU64 | BPF_DIV | BPF_X) || > insn->code == (BPF_ALU | BPF_MOD | BPF_X) || > insn->code == (BPF_ALU | BPF_DIV | BPF_X)) { > bool is64 = BPF_CLASS(insn->code) == BPF_ALU64; > bool isdiv = BPF_OP(insn->code) == BPF_DIV; > + bool is_sdiv = isdiv && insn->off == 1; > + bool is_smod = !isdiv && insn->off == 1; > struct bpf_insn *patchlet; > struct bpf_insn chk_and_div[] = { > /* [R,W]x div 0 -> 0 */ > @@ -20525,10 +20558,53 @@ static int do_misc_fixups(struct bpf_verifier_env *env) > BPF_JMP_IMM(BPF_JA, 0, 0, 1), > BPF_MOV32_REG(insn->dst_reg, insn->dst_reg), > }; > + struct bpf_insn chk_and_sdiv[] = { > + /* [R,W]x sdiv 0 -> 0 */ > + BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) | > + BPF_JNE | BPF_K, insn->src_reg, > + 0, 2, 0), > + BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg), > + BPF_JMP_IMM(BPF_JA, 0, 0, 4), > + /* LLONG_MIN sdiv -1 -> LLONG_MIN > + * INT_MIN sdiv -1 -> INT_MIN > + */ > + BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) | > + BPF_JNE | BPF_K, insn->src_reg, > + 0, 2, -1), > + /* BPF_NEG(LLONG_MIN) == -LLONG_MIN == LLONG_MIN */ > + BPF_RAW_INSN((is64 ? BPF_ALU64 : BPF_ALU) | > + BPF_OP(BPF_NEG) | BPF_K, insn->dst_reg, > + 0, 0, 0), > + BPF_JMP_IMM(BPF_JA, 0, 0, 1), I don't know how much it actually matters, but it feels like common safe case should be as straight-line-executed as possible, no? So maybe it's better to rearrange to roughly this (where rX is the divisor register): if rX == 0 goto L1 if rX == -1 goto L2 rY /= rX goto L3 L1: /* zero case */ rY = 0 /* fallthrough, negation doesn't hurt, but less jumping */ L2: /* negative one case (or zero) */ rY = -rY L3: ... the rest of the program code ... Those two branches for common case are still annoyingly inefficient, I wonder if we should do rX += 1 /* [-1, 0] -> [0, 1] if rX <=(unsigned) 1 goto L1 rX -= 1 /* restore original divisor */ rY /= rX /* common case */ goto L3 L1: if rX == 0 goto L2 /* jump if originally -1 */ rY = 0 /* division by zero case */ L2: /* fallthrough */ rY = -rY rX -= 1 /* restore original divisor */ L3: ... continue with the rest ... It's a bit trickier to follow, but should be faster in a common case. WDYT? Too much too far? > + *insn, > + }; > + struct bpf_insn chk_and_smod[] = { > + /* [R,W]x mod 0 -> [R,W]x */ > + BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) | > + BPF_JNE | BPF_K, insn->src_reg, > + 0, 2, 0), > + BPF_MOV32_REG(insn->dst_reg, insn->dst_reg), > + BPF_JMP_IMM(BPF_JA, 0, 0, 4), > + /* [R,W]x mod -1 -> 0 */ > + BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) | > + BPF_JNE | BPF_K, insn->src_reg, > + 0, 2, -1), > + BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg), > + BPF_JMP_IMM(BPF_JA, 0, 0, 1), > + *insn, > + }; > Same idea here, keep the common case as straight as possible. > - patchlet = isdiv ? chk_and_div : chk_and_mod; > - cnt = isdiv ? ARRAY_SIZE(chk_and_div) : > - ARRAY_SIZE(chk_and_mod) - (is64 ? 2 : 0); > + if (is_sdiv) { > + patchlet = chk_and_sdiv; > + cnt = ARRAY_SIZE(chk_and_sdiv); > + } else if (is_smod) { > + patchlet = chk_and_smod; > + cnt = ARRAY_SIZE(chk_and_smod); > + } else { > + patchlet = isdiv ? chk_and_div : chk_and_mod; > + cnt = isdiv ? ARRAY_SIZE(chk_and_div) : > + ARRAY_SIZE(chk_and_mod) - (is64 ? 2 : 0); > + } > > new_prog = bpf_patch_insn_data(env, i + delta, patchlet, cnt); > if (!new_prog) > -- > 2.43.5 >