On Mon, Sep 20, 2021 at 08:11:10AM -0700, Dave Marchevsky wrote: > The verifier currently logs some useful statistics in > print_verification_stats. Although the text log is an effective feedback > tool for an engineer iterating on a single application, it would also be > useful to enable tracking these stats in a more structured form for > fleetwide or historical analysis, which this patchset attempts to do. > > A concrete motivating usecase which came up in recent weeks: > > A team owns a complex BPF program, with various folks extending its > functionality over the years. An engineer tries to make a relatively > simple addition but encounters "BPF program is too large. Processed > 1000001 insn". > > Their changes bumped the processed insns from 700k to over the limit and > there's no obvious way to simplify. They must now consider a large > refactor in order to incorporate the new feature. What if there was some > previous change which bumped processed insns from 200k->700k which > _could_ be modified to stress verifier less? Tracking historical > verifier stats for each version of the program over the years would > reduce manual work necessary to find such a change. > > > Although parsing the text log could work for this scenario, a solution > that's resilient to log format and other verifier changes would be > preferable. > > This patchset adds a bpf_prog_verif_stats struct - containing the same > data logged by print_verification_stats - which can be retrieved as part > of bpf_prog_info. Looking for general feedback on approach and a few > specific areas before fleshing it out further: > > * None of my usecases require storing verif_stats for the lifetime of a > loaded prog, but adding to bpf_prog_aux felt more correct than trying > to pass verif_stats back as part of BPF_PROG_LOAD > * The verif_stats are probably not generally useful enough to warrant > inclusion in fdinfo, but hoping to get confirmation before removing > that change in patch 1 > * processed_insn, verification_time, and total_states are immediately > useful for me, rest were added for parity with > print_verification_stats. Can remove. > * Perhaps a version field would be useful in verif_stats in case future > verifier changes make some current stats meaningless > * Note: stack_depth stat was intentionally skipped to keep patch 1 > simple. Will add if approach looks good. Sorry for the delay. LPC consumes a lot of mental energy :) I see the value of exposing some of the verification stats as prog_info. Let's look at the list: struct bpf_prog_verif_stats { __u64 verification_time; __u32 insn_processed; __u32 max_states_per_insn; __u32 total_states; __u32 peak_states; __u32 longest_mark_read_walk; }; verification_time is non deterministic. It varies with frequency and run-to-run. I don't see how alerting tools can use it. insn_processed is indeed the main verification metric. By now it's well known and understood. max_states_per_insn, total_states, etc were the metrics I've studied carefully with pruning, back tracking and pretty much every significant change I did or reiviewed in the verifier. They're useful to humans and developers, but I don't see how alerting tools will use them. So it feels to me that insn_processed alone will be enough to address the monitoring goal. It can be exposed to fd_info and printed by bpftool. If/when it changes with some future verifier algorithm we should be able to approximate it.