On 9/23/21 4:51 PM, Alexei Starovoitov wrote: > On Mon, Sep 20, 2021 at 08:11:10AM -0700, Dave Marchevsky wrote: >> The verifier currently logs some useful statistics in >> print_verification_stats. Although the text log is an effective feedback >> tool for an engineer iterating on a single application, it would also be >> useful to enable tracking these stats in a more structured form for >> fleetwide or historical analysis, which this patchset attempts to do. >> >> A concrete motivating usecase which came up in recent weeks: >> >> A team owns a complex BPF program, with various folks extending its >> functionality over the years. An engineer tries to make a relatively >> simple addition but encounters "BPF program is too large. Processed >> 1000001 insn". >> >> Their changes bumped the processed insns from 700k to over the limit and >> there's no obvious way to simplify. They must now consider a large >> refactor in order to incorporate the new feature. What if there was some >> previous change which bumped processed insns from 200k->700k which >> _could_ be modified to stress verifier less? Tracking historical >> verifier stats for each version of the program over the years would >> reduce manual work necessary to find such a change. >> >> >> Although parsing the text log could work for this scenario, a solution >> that's resilient to log format and other verifier changes would be >> preferable. >> >> This patchset adds a bpf_prog_verif_stats struct - containing the same >> data logged by print_verification_stats - which can be retrieved as part >> of bpf_prog_info. Looking for general feedback on approach and a few >> specific areas before fleshing it out further: >> >> * None of my usecases require storing verif_stats for the lifetime of a >> loaded prog, but adding to bpf_prog_aux felt more correct than trying >> to pass verif_stats back as part of BPF_PROG_LOAD >> * The verif_stats are probably not generally useful enough to warrant >> inclusion in fdinfo, but hoping to get confirmation before removing >> that change in patch 1 >> * processed_insn, verification_time, and total_states are immediately >> useful for me, rest were added for parity with >> print_verification_stats. Can remove. >> * Perhaps a version field would be useful in verif_stats in case future >> verifier changes make some current stats meaningless >> * Note: stack_depth stat was intentionally skipped to keep patch 1 >> simple. Will add if approach looks good. > > Sorry for the delay. LPC consumes a lot of mental energy :) > > I see the value of exposing some of the verification stats as prog_info. > Let's look at the list: > struct bpf_prog_verif_stats { > __u64 verification_time; > __u32 insn_processed; > __u32 max_states_per_insn; > __u32 total_states; > __u32 peak_states; > __u32 longest_mark_read_walk; > }; > verification_time is non deterministic. It varies with frequency > and run-to-run. I don't see how alerting tools can use it. Makes sense to me, will get rid of it. > insn_processed is indeed the main verification metric. > By now it's well known and understood. > > max_states_per_insn, total_states, etc were the metrics I've studied > carefully with pruning, back tracking and pretty much every significant > change I did or reiviewed in the verifier. They're useful to humans > and developers, but I don't see how alerting tools will use them. > > So it feels to me that insn_processed alone will be enough to address the > monitoring goal. For the concrete usecase in my original message insn_processed would be enough. For the others - I thought there might be value in gathering those "fleetwide" to inform verifier development, e.g.: "Hmm, this team's libbpf program has been regressing total_states over past few {kernel, llvm} rollouts, but they haven't been modifying it. Let's try to get a minimal repro, send to bpf@vger, and contribute to selftests if it is indeed hitting a weird verifier edge case" So for those I'm not expecting them to be useful to alert on or be a number that the average BPF program writer needs to care about. Of course this is hypothetical as I haven't tried to gather such data and look for interesting patterns. But these metrics being useful to you when looking at significant verifier changes is a good sign. > It can be exposed to fd_info and printed by bpftool. > If/when it changes with some future verifier algorithm we should be able > to approximate it. >