Hello bpf, We have run into a verifier issue that had us scratching our heads for a while. The issues has been fixed by Eduard's recent series reworking the verification of bpf_loop callbacks ("verify callbacks as if they are called unknown number of times") [1]. I'm not sure if this kind of consequences of the bug that prompted those patches was understood; we reported the bug originally in [2] and we certainly did not understand it. The point of this message is to consider the backporting of that series (or perhaps of a more targeted fix); if those patches are already slated to be backported, or if what I'm describing is old news for everyone, maybe you can stop reading and sorry for the noise. The symptom that we've encountered is an instruction from the input assembly being erroneously dropped by the verifier. The verifier thinks that one branch of a conditional jump is never taken (and thus the test and one branch are eliminated), when in reality it most certainly is. This is similar to miscompilation and so quite bad for unsuspecting code. In contrast, the bug report in [2] only imagined that programs can pass verification when they shouldn't, not that they'll be "miscompiled". The issue is as follows: before Eduard's patches, the callback passed to bpf_loop() is verified as if it is only executed once. As such, the state that the verifier knows about the data referenced by the callback_ctx pointer only takes into account the possible values of the respective data before the loop starts; the verifier *does not* take into account the fact that the loop might execute multiple times and mutate that data referenced by the ctx pointer. The way this leads to garbage code is: - inside ctx, you can have a pointer to the stack - before the first iteration, the respective stack slot is known by the verifier to be 0 - the loop callback also has a conditional branch on the respective stack slot being != 0 - the loop callback sometimes mutates the respective stack slot - the verifier assumes that the branch is never taken (so it never verifies it), and so the test and the branch are elided by opt_remove_dead_code() because the `seen` flag on the branch's instructions is not set The mitigation on kernels before the fix, I guess, would be to not reference stack memory from the `ctx`. Instead, limit yourself to map data whose values are always unknown to the verifier. If Eduard's patch cannot be backported, I had a random thought that perhaps dead-code elimination could be partially disabled (perhaps only for loop callbacks, ignoring the functions that the callbacks call). Thanks, - Andrei [1] https://lore.kernel.org/bpf/20231121020701.26440-1-eddyz87@xxxxxxxxx/ [2] https://lore.kernel.org/bpf/CA+vRuzPChFNXmouzGG+wsy=6eMcfr1mFG0F3g7rbg-sedGKW3w@xxxxxxxxxxxxxx/