Re: [PATCH bpf-next v1 1/2] bpf: force checkpoint when jmp history is too long

Daniel Borkmann <daniel@xxxxxxxxxxxxx> · Fri, 18 Oct 2024 13:03:44 +0200

On 10/18/24 4:03 AM, Eduard Zingerman wrote:
A specifically crafted program might trick verifier into growing very
long jump history within a single bpf_verifier_state instance.
Very long jump history makes mark_chain_precision() unreasonably slow,
especially in case if verifier processes a loop.

Mitigate this by forcing new state in is_state_visited() in case if
current state's jump history is too long.

Use same constant as in `skip_inf_loop_check`, but multiply it by
arbitrarily chosen value 2 to account for jump history containing not
only information about jumps, but also information about stack access.
[...]

The log output shows that checkpoint at label (1) is never created,
because it is suppressed by `skip_inf_loop_check` logic:
a. When 'if' at (2) is processed it pushes a state with insn_idx (1)
    onto stack and proceeds to (3);
b. At (5) checkpoint is created, and this resets
    env->{jmps,insns}_processed.
c. Verification proceeds and reaches `exit`;
d. State saved at step (a) is popped from stack and is_state_visited()
    considers if checkpoint needs to be added, but because
    env->{jmps,insns}_processed had been just reset at step (b)
    the `skip_inf_loop_check` logic forces `add_new_state` to false.
e. Verifier proceeds with current state, which slowly accumulates
    more and more entries in the jump history.

The accumulation of entries in the jump history is a problem because
of two factors:
- it eventually exhausts memory available for kmalloc() allocation;
- mark_chain_precision() traverses the jump history of a state,
   meaning that if `r7` is marked precise, verifier would iterate
   ever growing jump history until parent state boundary is reached.

(note: the log also shows a REG INVARIANTS VIOLATION warning
        upon jset processing, but that's another bug to fix).

With this patch applied, the example above is rejected by verifier
under 1s of time, reaching 1M instructions limit.

The program is a simplified reproducer from syzbot report [1].
Previous discussion could be found at [2].
The patch does not cause any changes in verification performance,
when tested on selftests from veristat.cfg and cilium programs taken
from [3].

[1] https://lore.kernel.org/bpf/670429f6.050a0220.49194.0517.GAE@xxxxxxxxxx/
[2] https://lore.kernel.org/bpf/20241009021254.2805446-1-eddyz87@xxxxxxxxx/
[3] https://github.com/anakryiko/cilium

Impressive that syzbot was able to generate this, and awesome analysis
as well as fix.

I guess we should also add :

Reported-by: syzbot+7e46cdef14bf496a3ab4@xxxxxxxxxxxxxxxxxxxxxxxxx

Can we also add a Fixes tag so that this can eventually be picked up
by stable? bpf tree would be the appropriate target, no?

Signed-off-by: Eduard Zingerman <eddyz87@xxxxxxxxx>

Acked-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx>

---
  kernel/bpf/verifier.c | 14 ++++++++++++--
  1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f514247ba8ba..f64c831a9278 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -17873,13 +17873,23 @@ static bool iter_active_depths_differ(struct bpf_verifier_state *old, struct bpf
  	return false;
  }
  
+#define MAX_JMPS_PER_STATE 20
+
  static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
  {
  	struct bpf_verifier_state_list *new_sl;
  	struct bpf_verifier_state_list *sl, **pprev;
  	struct bpf_verifier_state *cur = env->cur_state, *new, *loop_entry;
  	int i, j, n, err, states_cnt = 0;
-	bool force_new_state = env->test_state_freq || is_force_checkpoint(env, insn_idx);
+	bool force_new_state = env->test_state_freq || is_force_checkpoint(env, insn_idx) ||
+			       /* - Long jmp history hinders mark_chain_precision performance,
+				*   so force new state if jmp history of current state exceeds
+				*   a threshold.
+				* - Jmp history records not only jumps, but also stack access,
+				*   so keep this constant 2x times the limit imposed on
+				*   env->jmps_processed for loop cases (see skip_inf_loop_check).
+				*/
+			       cur->jmp_history_cnt > MAX_JMPS_PER_STATE * 2;
  	bool add_new_state = force_new_state;
  	bool force_exact;
  
@@ -18023,7 +18033,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
  			 */
  skip_inf_loop_check:
  			if (!force_new_state &&
-			    env->jmps_processed - env->prev_jmps_processed < 20 &&
+			    env->jmps_processed - env->prev_jmps_processed < MAX_JMPS_PER_STATE &&
  			    env->insn_processed - env->prev_insn_processed < 100)
  				add_new_state = false;
  			goto miss;