Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> writes: > On Thu, Mar 13, 2025 at 10:57 AM Luis Gerhorst <luis.gerhorst@xxxxxx> wrote: >> With increased limits this allows applying mitigations to large BPF >> progs such as the Parca Continuous Profiler's prog. However, this >> requires a jump-seq limit of 256k. In any case, the same principle >> should apply to smaller programs therefore include it even if the limit >> stays at 8k for now. Most programs in [1] only require a limit of 32k. > > Do you mean that without this change the verifier needs 256k > jmp limit to load Parca's prog as unpriv due to speculative > path exploration with push_stack ? > Only with this change Parca is loadable when manually enabling Spectre defenses for privileged programs and setting the following limits: - BPF_COMPLEXITY_LIMIT_JMP_SEQ=256k - BPF_COMPLEXITY_LIMIT_SPEC_V1_VERIFICATION=128k - BPF_COMPLEXITY_LIMIT_INSNS=32M > > And this change uses 4k as a trade-off between prog runtime > and verification time ? > Yes, this change uses 4k to limited nested speculative path exploration. At the top-level (i.e., on architectural paths), spawned speculative paths are still explored. > > But tracing progs use bpf_probe_read_kernel(), so they're never going > to be unpriv. > I'm sorry this was not clear. Parca is only used as an example here to test whether this improves expressiveness in general. If you do not see this as a favorable tradeoff (because of the code complexity), I think it would be acceptable to drop the last patch for now. The biggest improvements is already achieved with the other patches as evident from the selftests. I can do a more exhaustive analysis on patch 11 later. > >> @@ -2010,6 +2011,19 @@ static struct bpf_verifier_state *push_stack(struct bpf_verifier_env *env, >> struct bpf_verifier_stack_elem *elem; >> int err; >> >> + if (!env->bypass_spec_v1 && >> + cur->speculative && > > Should this be > (cur->speculative || speculative) > ? No, I think it will be unsafe to add `|| speculative` here. If we were to return -EINVAL from push_stack() when pushing a speculative path from a non-speculative path (e.g., in check_cond_jmp_op() through sanitize_speculative_path()), this will cause do_check() to add an lfence before the jump-op. Here's a minimal example program: A = true B = true if A goto e f() if B goto e unsafe() e: exit There are the following speculative and non-speculative paths (`cur->speculative` and `speculative` referring to the value of the push_stack() parameters): - A = true - B = true - if A goto e - A && !cur->speculative && !speculative - exit - !A && !cur->speculative && speculative - f() - if B goto e - B && cur->speculative && !speculative - exit - !B && cur->speculative && speculative - unsafe() `|| speculative` might cause us to only add an lfence before `if A goto e`, which would not be sufficient. Intel recommends adding the lfence after the jump [1]. > > In general I'm not convinced that the approach is safe. > > This recoverable EINVAL means that exploration under speculation > stops early, but there could be more branches and they won't be > sanitized with extra lfence. > So speculative execution can still happen at later insns. > `goto process_bpf_exit` only causes us to stop analyzing this particular path, not the rest of the program. This is based on the assumption, that the lfence stops the CPU from ever reaching those branches (if they are not reachable through other means). > > Similar concern in patch 7: > + if (state->speculative && cur_aux(env)->nospec) > + goto process_bpf_exit; > > One lfence at this insn doesn't stop speculation until the program end. > Only at this insn. The rest of the code is free to speculate. > Taking the program above as an example, this allows us to stop before f() if an lfence was inserted there. Fully patched program would be: A = true B = true if A goto e lfence f() if B goto e unsafe() e: exit In this example, all instructions after the lfence are dead code (and with the lfence they are also dead code speculatively). I believe this is in line with Intel's guidance [1]: An LFENCE instruction or a serializing instruction will ensure that no later instructions execute, even speculatively, until all prior instructions complete locally. Developers might prefer LFENCE over a serializing instruction because LFENCE may have lower latency. Inserting an LFENCE instruction after a bounds check prevents later operations from executing before the bound check completes. With regards to the example, this implies that `if B goto e` will not execute before `if A goto e` completes. Once `if A goto e` completes, the CPU should find that the speculation was wrong and continue with `exit`. If there is any other path that leads to `if B goto e` (and therefore `unsafe()`) without going through `if A goto e`, then an lfence might of course still be needed there. However, I assume this other path will be explored separately and therefore be discovered by the verifier even if the exploration discussed here stops at the lfence. If this assumption is wrong, please let me know. > > The refactoring in patches 1-3 is nice. > Patches 4-5 are tricky and somewhat questionable, but make sense. > Patch 7 without early goto process_bpf_exit looks correct too, > Patch 8 is borderline. Feels like it's opening the door for > new vulnerabilities and space to explore for security researchers. > We disabled unpriv bpf by default and have no intentions to enable it. > Even if we land the whole thing the unpriv will stay disabled. > So trade offs don't appear favorable. > Thank you very much for having a look. Let me know whether the above resolves your concern. In any case, should I separate patches 1-3 into another series? [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/runtime-speculative-side-channel-mitigations.html ("Managed Runtime Speculative Execution Side Channel Mitigations")