On Sun, Jul 25, 2021 at 6:01 PM Yonghong Song <yhs@xxxxxx> wrote: > > > > On 7/25/21 8:22 AM, Vincent Li wrote: > > > > > > > > On Sat, 24 Jul 2021, Vincent Li wrote: > > > >> > >> > >> On Sat, 24 Jul 2021, Vincent Li wrote: > >> > >>> On Fri, Jul 23, 2021 at 7:17 PM Vincent Li <vincent.mc.li@xxxxxxxxx> wrote: > >>>> > >>>> > >>>> Hi BPF experts, > >>>> > >>>> I have a cilium PR https://github.com/cilium/cilium/pull/16916 that > >>>> failed to pass verifier in kernel 4.19, the error is like: > >>>> > >>>> level=warning msg="Prog section '2/7' rejected: Argument list too long > >>>> (7)!" subsys=datapath-loader > >>>> level=warning msg=" - Type: 3" subsys=datapath-loader > >>>> level=warning msg=" - Attach Type: 0" subsys=datapath-loader > >>>> level=warning msg=" - Instructions: 4578 (482 over limit)" > >>>> subsys=datapath-loader > >>>> level=warning msg=" - License: GPL" subsys=datapath-loader > >>>> level=warning subsys=datapath-loader > >>>> level=warning msg="Verifier analysis:" subsys=datapath-loader > >>>> level=warning subsys=datapath-loader > >>>> level=warning msg="Error filling program arrays!" subsys=datapath-loader > >>>> level=warning msg="Unable to load program" subsys=datapath-loader > >>>> > >>>> then I tried to run the PR locally in my dev machine with custom upstream > >>>> kernel version, I narrowed the issue down to between upstream kernel > >>>> version 5.7 and 5.8, in 5.7, it failed with: > >>> > >>> I further narrow it down to between 5.7 and 5.8-rc1 release, but still > >>> no clue which commits in 5.8-rc1 resolved the issue > >>> > >>>> > >>>> level=warning msg="processed 50 insns (limit 1000000) max_states_per_insn > >>>> 0 total_states 1 peak_states 1 mark_read 1" subsys=datapath-loader > >>>> level=warning subsys=datapath-loader > >>>> level=warning msg="Log buffer too small to dump verifier log 16777215 > >>>> bytes (9 tries)!" subsys=datapath-loader > > The error message is "Log buffer too small to dump verifier log 16777215 > bytes (9 tries)!". > > Commit 6f8a57ccf8511724e6f48d732cb2940889789ab2 made the default log > much shorter. So it fixed the above log buffer too small issue. > Thank you for the confirmation, after I remove 'verbose' log, indeed the problem went away for kernel 5.x- 5.8, but the "Prog section '2/7' rejected: Argument list too long.." issue persisted even after I remove the "verbose" logging for kernel version 4.19, any clue on that? > >>>> level=warning msg="Error filling program arrays!" subsys=datapath-loader > >>>> level=warning msg="Unable to load program" subsys=datapath-loader > >>>> > >>>> 5.8 works fine. > >>>> > >>>> What difference between 5.7 and 5.8 to cause this verifier problem, I > >>>> tried to git log v5.7..v5.8 kernel/bpf/verifier, I could not see commits > >>>> that would make the difference with my limited BPF knowledge. Any clue > >>>> would be appreciated! > >> > >> I have git bisected to this commit: > >> > >> # first fixed commit: [6f8a57ccf8511724e6f48d732cb2940889789ab2] bpf: Make > >> verifier log more relevant by default > > > > both the cilium github PR test and my local dev machine PR test has the > > verbose set, for example, my local test has: > > > > diff --git a/pkg/datapath/loader/netlink.go > > b/pkg/datapath/loader/netlink.go > > index 381e1fbc8..00015eabc 100644 > > --- a/pkg/datapath/loader/netlink.go > > +++ b/pkg/datapath/loader/netlink.go > > @@ -106,7 +106,7 @@ func replaceDatapath(ctx context.Context, ifName, > > objPath, progSec, progDirectio > > loaderProg = "tc" > > args = []string{"filter", "replace", "dev", ifName, > > progDirection, > > "prio", "1", "handle", "1", "bpf", "da", "obj", > > objPath, > > - "sec", progSec, > > + "sec", progSec, "verbose", > > } > > } > > cmd = exec.CommandContext(ctx, loaderProg, > > args...).WithFilters(libbpfFixupMsg) > > > > if I remove the "verbose" change, and run the Cilium agent without > > kernel commit 6f8a57ccf8, the problem is gone, it seems commit 6f8a57ccf8 > > is related > > Remove "verbose" should work since the kernel won't do logging any more. > > > > >> > >> this commit looks only dealing with log, accidently fixed the PR issue I > >> have? my PR use __bpf_memcpy_builtin() to rewrite the tunnel inner packet > >> destination MAC address, somehow related? > >> > > [...]