On 11/26/21 8:50 AM, Maxim Mikityanskiy wrote:
On 2021-11-26 07:43, Yonghong Song wrote:
On 11/25/21 6:34 AM, Maxim Mikityanskiy wrote:
On 2021-11-09 09:11, Yonghong Song wrote:
On 11/3/21 7:02 AM, Maxim Mikityanskiy wrote:
On 2021-11-03 04:10, Yonghong Song wrote:
On 11/1/21 4:14 AM, Maxim Mikityanskiy wrote:
On 2021-10-20 19:16, Toke Høiland-Jørgensen wrote:
Lorenz Bauer <lmb@xxxxxxxxxxxxxx> writes:
+bool cookie_init_timestamp_raw(struct tcphdr *th, __be32
*tsval, __be32 *tsecr)
I'm probably missing context, Is there something in this
function that
means you can't implement it in BPF?
I was about to reply with some other comments but upon closer
inspection
I ended up at the same conclusion: this helper doesn't seem to
be needed
at all?
After trying to put this code into BPF (replacing the underlying
ktime_get_ns with ktime_get_mono_fast_ns), I experienced issues
with passing the verifier.
In addition to comparing ptr to end, I had to add checks that
compare ptr to data_end, because the verifier can't deduce that
end <= data_end. More branches will add a certain slowdown (not
measured).
A more serious issue is the overall program complexity. Even
though the loop over the TCP options has an upper bound, and the
pointer advances by at least one byte every iteration, I had to
limit the total number of iterations artificially. The maximum
number of iterations that makes the verifier happy is 10. With
more iterations, I have the following error:
BPF program is too large. Processed 1000001 insn
processed 1000001 insns (limit 1000000)
max_states_per_insn 29 total_states 35489 peak_states 596
mark_read 45
I assume that BPF_COMPLEXITY_LIMIT_INSNS (1 million) is the
accumulated amount of instructions that the verifier can process
in all branches, is that right? It doesn't look realistic that my
program can run 1 million instructions in a single run, but it
might be that if you take all possible flows and add up the
instructions from these flows, it will exceed 1 million.
The limitation of maximum 10 TCP options might be not enough,
given that valid packets are permitted to include more than 10
NOPs. An alternative of using bpf_load_hdr_opt and calling it
three times doesn't look good either, because it will be about
three times slower than going over the options once. So maybe
having a helper for that is better than trying to fit it into BPF?
One more interesting fact is the time that it takes for the
verifier to check my program. If it's limited to 10 iterations,
it does it pretty fast, but if I try to increase the number to 11
iterations, it takes several minutes for the verifier to reach 1
million instructions and print the error then. I also tried
grouping the NOPs in an inner loop to count only 10 real options,
and the verifier has been running for a few hours without any
response. Is it normal?
Maxim, this may expose a verifier bug. Do you have a reproducer I
can access? I would like to debug this to see what is the root
case. Thanks!
Thanks, I appreciate your help in debugging it. The reproducer is
based on the modified XDP program from patch 10 in this series.
You'll need to apply at least patches 6, 7, 8 from this series to
get new BPF helpers needed for the XDP program (tell me if that's a
problem, I can try to remove usage of new helpers, but it will
affect the program length and may produce different results in the
verifier).
See the C code of the program that passes the verifier (compiled
with clang version 12.0.0-1ubuntu1) in the bottom of this email. If
you increase the loop boundary from 10 to at least 11 in
cookie_init_timestamp_raw(), it fails the verifier after a few
minutes.
I tried to reproduce with latest llvm (llvm-project repo),
loop boundary 10 is okay and 11 exceeds the 1M complexity limit. For
10,
the number of verified instructions is 563626 (more than 0.5M) so it is
totally possible that one more iteration just blows past the limit.
So, does it mean that the verifying complexity grows exponentially
with increasing the number of loop iterations (options parsed)?
Depending on verification time pruning results, it is possible
slightly increase number of branches could result quite some (2x, 4x,
etc.) of
to-be-verified dynamic instructions.
Is it at least theoretically possible to make this coefficient below 2x?
I.e. write a loop, so that adding another iteration will not double the
number of verified instructions, but will have a smaller increase?
If that's not possible, then it looks like BPF can't have loops bigger
than ~19 iterations (2^20 > 1M), and this function is not implementable
in BPF.
This is the worst case. As I mentioned pruning plays a huge role in
verification. Effective pruning can add little increase of dynamic
instructions say from 19 iterations to 20 iterations. But we have
to look at verifier log to find out whether pruning is less effective or
something else... Based on my experience, in most cases, pruning is
quite effective. But occasionally it is not... You can look at
verifier.c file to roughly understand how pruning work.
Not sure whether in this case it is due to less effective pruning or
inherently we just have to go through all these dynamic instructions for
verification.
Is it a good enough reason to keep this code as a BPF helper, rather
than trying to fit it into the BPF program?
Another option is to use global function, which is verified separately
from the main bpf program.
Simply removing __always_inline didn't change anything. Do I need to
make any other changes? Will it make sense to call a global function in
a loop, i.e. will it increase chances to pass the verifier?
global function cannot be static function. You can try
either global function inside the loop or global function
containing the loop. It probably more effective to put loops
inside the global function. You have to do some experiments
to see which one is better.
If you apply this tiny change, it fails the verifier after about 3
hours:
[...]