On 9/12/24 11:17 AM, Andrii Nakryiko wrote:
On Wed, Sep 11, 2024 at 9:00 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
Zac Ecob reported a problem where a bpf program may cause kernel crash due
to the following error:
Oops: divide error: 0000 [#1] PREEMPT SMP KASAN PTI
The failure is due to the below signed divide:
LLONG_MIN/-1 where LLONG_MIN equals to -9,223,372,036,854,775,808.
LLONG_MIN/-1 is supposed to give a positive number 9,223,372,036,854,775,808,
but it is impossible since for 64-bit system, the maximum positive
number is 9,223,372,036,854,775,807. On x86_64, LLONG_MIN/-1 will
cause a kernel exception. On arm64, the result for LLONG_MIN/-1 is
LLONG_MIN.
Further investigation found all the following sdiv/smod cases may trigger
an exception when bpf program is running on x86_64 platform:
- LLONG_MIN/-1 for 64bit operation
- INT_MIN/-1 for 32bit operation
- LLONG_MIN%-1 for 64bit operation
- INT_MIN%-1 for 32bit operation
where -1 can be an immediate or in a register.
On arm64, there are no exceptions:
- LLONG_MIN/-1 = LLONG_MIN
- INT_MIN/-1 = INT_MIN
- LLONG_MIN%-1 = 0
- INT_MIN%-1 = 0
where -1 can be an immediate or in a register.
Insn patching is needed to handle the above cases and the patched codes
produced results aligned with above arm64 result.
[1] https://lore.kernel.org/bpf/tPJLTEh7S_DxFEqAI2Ji5MBSoZVg7_G-Py2iaZpAaWtM961fFTWtsnlzwvTbzBzaUzwQAoNATXKUlt0LZOFgnDcIyKCswAnAGdUF3LBrhGQ=@protonmail.com/
Reported-by: Zac Ecob <zacecob@xxxxxxxxxxxxxx>
Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx>
---
kernel/bpf/verifier.c | 84 ++++++++++++++++++++++++++++++++++++++++---
1 file changed, 80 insertions(+), 4 deletions(-)
Changelogs:
v1 -> v2:
- Handle more crash cases like 32bit operation and modules.
- Add more tests to test new cases.
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f35b80c16cda..ad7f51302c70 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -20499,13 +20499,46 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
/* Convert BPF_CLASS(insn->code) == BPF_ALU64 to 32-bit ALU */
insn->code = BPF_ALU | BPF_OP(insn->code) | BPF_SRC(insn->code);
- /* Make divide-by-zero exceptions impossible. */
+ /* Make sdiv/smod divide-by-minus-one exceptions impossible. */
+ if ((insn->code == (BPF_ALU64 | BPF_MOD | BPF_K) ||
+ insn->code == (BPF_ALU64 | BPF_DIV | BPF_K) ||
+ insn->code == (BPF_ALU | BPF_MOD | BPF_K) ||
+ insn->code == (BPF_ALU | BPF_DIV | BPF_K)) &&
+ insn->off == 1 && insn->imm == -1) {
+ bool is64 = BPF_CLASS(insn->code) == BPF_ALU64;
+ bool isdiv = BPF_OP(insn->code) == BPF_DIV;
+ struct bpf_insn *patchlet;
+ struct bpf_insn chk_and_div[] = {
+ BPF_RAW_INSN((is64 ? BPF_ALU64 : BPF_ALU) |
+ BPF_OP(BPF_NEG) | BPF_K, insn->dst_reg,
+ 0, 0, 0),
+ };
+ struct bpf_insn chk_and_mod[] = {
+ BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg),
+ };
+
+ patchlet = isdiv ? chk_and_div : chk_and_mod;
nit: "chk_and_" part in the name is misleading, it's more like
"safe_div" and "safe_mod". Oh, and it's "sdiv" and "smod" specific, so
probably not a bad idea to have that in the name as well.
good idea. Will use chk_and_sdiv and chk_and_smod.
+ cnt = isdiv ? ARRAY_SIZE(chk_and_div) : ARRAY_SIZE(chk_and_mod);
+
+ new_prog = bpf_patch_insn_data(env, i + delta, patchlet, cnt);
+ if (!new_prog)
+ return -ENOMEM;
+
+ delta += cnt - 1;
+ env->prog = prog = new_prog;
+ insn = new_prog->insnsi + i + delta;
+ goto next_insn;
+ }
+
+ /* Make divide-by-zero and divide-by-minus-one exceptions impossible. */
if (insn->code == (BPF_ALU64 | BPF_MOD | BPF_X) ||
insn->code == (BPF_ALU64 | BPF_DIV | BPF_X) ||
insn->code == (BPF_ALU | BPF_MOD | BPF_X) ||
insn->code == (BPF_ALU | BPF_DIV | BPF_X)) {
bool is64 = BPF_CLASS(insn->code) == BPF_ALU64;
bool isdiv = BPF_OP(insn->code) == BPF_DIV;
+ bool is_sdiv = isdiv && insn->off == 1;
+ bool is_smod = !isdiv && insn->off == 1;
struct bpf_insn *patchlet;
struct bpf_insn chk_and_div[] = {
/* [R,W]x div 0 -> 0 */
@@ -20525,10 +20558,53 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
BPF_JMP_IMM(BPF_JA, 0, 0, 1),
BPF_MOV32_REG(insn->dst_reg, insn->dst_reg),
};
+ struct bpf_insn chk_and_sdiv[] = {
+ /* [R,W]x sdiv 0 -> 0 */
+ BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
+ BPF_JNE | BPF_K, insn->src_reg,
+ 0, 2, 0),
+ BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg),
+ BPF_JMP_IMM(BPF_JA, 0, 0, 4),
+ /* LLONG_MIN sdiv -1 -> LLONG_MIN
+ * INT_MIN sdiv -1 -> INT_MIN
+ */
+ BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
+ BPF_JNE | BPF_K, insn->src_reg,
+ 0, 2, -1),
+ /* BPF_NEG(LLONG_MIN) == -LLONG_MIN == LLONG_MIN */
+ BPF_RAW_INSN((is64 ? BPF_ALU64 : BPF_ALU) |
+ BPF_OP(BPF_NEG) | BPF_K, insn->dst_reg,
+ 0, 0, 0),
+ BPF_JMP_IMM(BPF_JA, 0, 0, 1),
I don't know how much it actually matters, but it feels like common
safe case should be as straight-line-executed as possible, no?
So maybe it's better to rearrange to roughly this (where rX is the
divisor register):
if rX == 0 goto L1
if rX == -1 goto L2
rY /= rX
goto L3
L1: /* zero case */
rY = 0 /* fallthrough, negation doesn't hurt, but less jumping */
L2: /* negative one case (or zero) */
rY = -rY
L3:
... the rest of the program code ...
My previous patched insn try to clearly separate rX == 0 and
rX == -1 case. It has 2 insns including 2 cond jmps, 2 uncond jmps and
one 3 alu operations. The above one removed one uncond jmp, which
is indeed better.
Those two branches for common case are still annoyingly inefficient, I
wonder if we should do
rX += 1 /* [-1, 0] -> [0, 1]
if rX <=(unsigned) 1 goto L1
rX -= 1 /* restore original divisor */
rY /= rX /* common case */
goto L3
L1:
if rX == 0 goto L2 /* jump if originally -1 */
rY = 0 /* division by zero case */
L2: /* fallthrough */
rY = -rY
rX -= 1 /* restore original divisor */
L3:
... continue with the rest ...
It's a bit trickier to follow, but should be faster in a common case.
WDYT? Too much too far?
This is even better. The above rX -= 1 can be removed if we use
BPF_REG_AX as the temporary register. For example,
tmp = rX
tmp += 1 /* [-1, 0] -> [0, 1]
if tmp <=(unsigned) 1 goto L1
rY /= rX /* common case */
goto L3
L1:
if tmp == 0 goto L2 /* jump if originally -1 */
rY = 0 /* division by zero case */
L2: /* fallthrough */
rY = -rY
L3:
... continue with the rest ...
Maybe we can do even better
tmp = rX
tmp += 1 /* [-1, 0] -> [0, 1]
if tmp >(unsigned) 1 goto L2
if tmp == 0 goto L1
rY = 0
L1:
rY = -rY;
goto L3
L2:
rY /= rX
L3:
Could this be even better by reducing one uncond jmp in the fast path?
+ *insn,
+ };
+ struct bpf_insn chk_and_smod[] = {
+ /* [R,W]x mod 0 -> [R,W]x */
+ BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
+ BPF_JNE | BPF_K, insn->src_reg,
+ 0, 2, 0),
+ BPF_MOV32_REG(insn->dst_reg, insn->dst_reg),
+ BPF_JMP_IMM(BPF_JA, 0, 0, 4),
+ /* [R,W]x mod -1 -> 0 */
+ BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
+ BPF_JNE | BPF_K, insn->src_reg,
+ 0, 2, -1),
+ BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg),
+ BPF_JMP_IMM(BPF_JA, 0, 0, 1),
+ *insn,
+ };
Same idea here, keep the common case as straight as possible.
Sure. Will do.
- patchlet = isdiv ? chk_and_div : chk_and_mod;
- cnt = isdiv ? ARRAY_SIZE(chk_and_div) :
- ARRAY_SIZE(chk_and_mod) - (is64 ? 2 : 0);
+ if (is_sdiv) {
+ patchlet = chk_and_sdiv;
+ cnt = ARRAY_SIZE(chk_and_sdiv);
+ } else if (is_smod) {
+ patchlet = chk_and_smod;
+ cnt = ARRAY_SIZE(chk_and_smod);
+ } else {
+ patchlet = isdiv ? chk_and_div : chk_and_mod;
+ cnt = isdiv ? ARRAY_SIZE(chk_and_div) :
+ ARRAY_SIZE(chk_and_mod) - (is64 ? 2 : 0);
+ }
new_prog = bpf_patch_insn_data(env, i + delta, patchlet, cnt);
if (!new_prog)
--
2.43.5