Martin KaFai Lau <kafai@xxxxxx> writes: > On Thu, Oct 07, 2021 at 11:25:29PM +0200, Daniel Borkmann wrote: >> I tend to agree with Toke here that this is not generic. What has been tried >> to improve the verifier instead before submitting the series? It would be much >> more preferable to improve the developer experience with regards to a generic >> solution, so that other/similar problems can be tackled in one go as well such >> as IP options, extension headers, etc. > It would be nice to improve verifier to recognize it more smoothly. Would > love to hear idea how to do it. So as far as I could tell, the verifier blows up in part because when there's multiple bounded loops in sequence the verifier gets into a combinatorial explosion of exploring all paths through the first loop combined with all paths through the second. So if we could teach the verifier to recognise that each loop is a separate entity to avoid this, I think looping through headers would be a lot easier. As you can probably tell, though, there is quite a bit of handwaving in the above, and I have no idea how to actually do this. Some kind of invariant analysis, maybe? But is this possible in general? > When adding the tcp header options for bpf_sockops, a bpf_store_hdr_opt() > is needed to ensure the header option is sane. When writing test to parse > variable length header option, I also pulled in tricks (e.g. "#pragma unroll" > is easier to get it work. Tried bounded loop but then hits max insns and > then moved some cases into subprog...etc). Most (if not all) TCP headers > has some options (e.g. tstamp), so it will be useful to have an easy way > to search a particular option and bpf_load_hdr_opt() was also added to > bpf_sockops. So if we can't fix the verifier, maybe we could come up with a more general helper for packet parsing? Something like: bpf_for_each_pkt_chunk(ctx, offset, callback_fn, callback_arg) { ptr = ctx->data + offset; while (ptr < ctx->data_end) { offset = callback_fn(ptr, ctx->data_end, callback_arg); if (offset == 0) return 0; ptr += offset; } // out of bounds before callback was done return -EINVAL; } This would work for parsing any kind of packet header or TLV-style data without having to teach the kernel about each header type. It'll have quite a bit of overhead if all the callbacks happen via indirect calls, but maybe the verifier can inline the calls (or at least turn them into direct CALL instructions)? -Toke