Re: [RFC] bpf: Rethinking BPF safety, BPF open-coded iterators, and possible improvements (runtime protection)

Juntong Deng <juntong.deng@xxxxxxxxxxx> · Fri, 14 Feb 2025 20:53:43 +0000

On 2025/2/8 02:40, Alexei Starovoitov wrote:
On Tue, Feb 4, 2025 at 4:40 PM Juntong Deng <juntong.deng@xxxxxxxxxxx> wrote:

On 2025/2/4 23:59, Alexei Starovoitov wrote:
On Tue, Feb 4, 2025 at 11:35 PM Juntong Deng <juntong.deng@xxxxxxxxxxx> wrote:

This discussion comes from the patch series open-coded BPF file
iterator, which was Nack-ed and thus ended [0].

Thanks for the feedback from Christian, Linus, and Al, all very helpful.

The problems encountered in this patch series may also be encountered in
other BPF open-coded iterators to be added in the future, or in other
BPF usage scenarios.

So maybe this is a good opportunity for us to discuss all of this and
rethink BPF safety, BPF open coded iterators, and possible improvements.

[0]:
https://lore.kernel.org/bpf/AM6PR03MB50801990BD93BFA2297A123599EC2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/T/#t

What do we expect from BPF safety?
----------------------------------

Christian points out the important fact that BPF programs can hold
references for a long time and cause weird issues.

This is an inherent flaw in BPF. Since the addition of bpf_loop and
BPF open-code iterators, the myth that BPF is "absolutely" safe has
been broken.

The BPF verifier is a static verifier and has no way of knowing how
long a BPF program will actually run.

For example, the following BPF program can freeze your computer, but
can pass the BPF verifier smoothly.

SEC("raw_tp/sched_switch")
int BPF_PROG(on_switch)
{
          struct bpf_iter_num it;
          int *v;
          bpf_iter_num_new(&it, 0, 100000);
          while ((v = bpf_iter_num_next(&it))) {
                  struct bpf_iter_num it2;
                  bpf_iter_num_new(&it2, 0, 100000);
                  while ((v = bpf_iter_num_next(&it2))) {
                          bpf_printk("BPF Bomb\n");
                  }
                  bpf_iter_num_destroy(&it2);
          }
          bpf_iter_num_destroy(&it);
          return 0;
}

This BPF program runs a huge loop at each schedule.

bpf_iter_num_new is a common iterator that we can use in almost any
context, including LSM, sched-ext, tracing, etc.

We can run large, long loops on any critical code path and freeze the
system, since the BPF verifier has no way of knowing how long the
iteration will run.

This is completely orthogonal to the issue that Christian explained.

Thanks for your reply!

Completely orthogonal? Sorry, I may have some misunderstandings.

...

program runs a huge loop at each schedule

You've discovered bpf iterators and said, rephrasing,
"loops can take a long time" and concluded with:
"This is an inherent flaw in BPF".

This kind of rhetoric is not helpful.
People that wanted to abuse bpf powers could have done it 10 years
ago without iterators, loops, etc.
One could create a hash map and populate it with collisions
and long per bucket link lists. Though we have random seed with enough
persistence hashtab becomes slow.
Then just do bpf_map_lookup_elem() from the prog.
This was a known issue that is gradually being fixed.

Sorry for my inappropriate expression.

Actually I just wanted to give an example to show that the problem has
existed for a long time and exists in other iterators as well...

Sorry for using "inherent flaw in BPF", I should try to help fix it.

Could you please share a link to the patch? I am curious how we can
fix this.

There is no "fix" for the iterator. There is no single patch either.
The issues were discussed over many _years_ in LPC and LSFMM.
Exception logic was a step to fixing it.
Now we will do "exceptions part 2" or will rip out exceptions completely
and go with "fast execute" approach.
When either approach works we can add a watchdog (and other mechanisms)
to cancel program execution.
Unlike user space there is no easy way to sigkill bpf prog.
We have to free up all resources cleanly.

I sent a proof-of-concept patch series [0] that implements low-overhead,
non-intrusive runtime acquire/release tracking.

By replacing the address of the CALL instruction during JIT, BPF runtime
hooks can be implemented.

I hope this patch series will help with the watchdog and resource
auto-release issues.

[0]: 
https://lore.kernel.org/bpf/AM6PR03MB5080513BFAEB54A93CC70D4399FE2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/T/#u

Yes, I am willing to help, so I included a "Possible improvements"
section.

With rants like "inherent flaw in BPF" it's hard to take
your offer of help seriously.

I am also working on another patch about filters that we discussed
earlier, although it still needs some time.

Pls focus on landing that first.