On Wed, Jul 10, 2024 at 11:41 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > On Wed, 2024-07-10 at 10:50 -0700, Andrii Nakryiko wrote: > > [...] > > > > > Ok, I see, this wasn't obvious that you want this behavior. I actually > > > > find it surprising (and so at least let's call this possibility out). > > > > > > > > But, tbh, I'd make this stricter. I'd dictate that within a subprogram > > > > all no_csr patterns *should* end with the same stack offset and I > > > > would check for that (so what I mentioned before that instead of min > > > > or max we need assignment and check for equality if we already > > > > assigned it once). > > > > > > This makes sense, but does not cover a theoretical corner case: > > > - suppose there are two nocsr functions A and B on the kernel side, > > > but only was A marked as nocsr during program compilation; > > > - the spill/fill "bracket" was generated for a call to function B by > > > the compiler (because this is just a valid codegen). > > > > In this case the call to A shouldn't be changing nocsr_offset at all, > > though? You should find no spill/fill and thus even though A is > > *allowed* to use no_csr, it doesn't, so it should have no effect on > > nocsr offsets, no? > > Consider the following code: > > *(u64 *)(r10 - 24) = r1; > call A // kernel and clang know that A is nocsr > r1 = *(u64 *)(r10 - 24); // kernel assumes .nocsr_stack_offset -24 > ... > *(u64 *)(r10 - 16) = r1; > call B // only kernel knows that B is nocsr > r1 = *(u64 *)(r10 - 16); // with stricter logic this would disable > // nocsr for the whole sub-probram oh, you mean that r10-16 accesses are valid instructions emitted by the compiler but not due to nocsr? I mean, tough luck?... We shouldn't reject this, but nocsr is ultimately a performance optimization, so not critical if it doesn't work within some subprogram. > > > But your example actually made me think about another (not theoretical > > at all) case. What if we have calls to A and B, the kernel is slightly > > old and knows that B is nocsr, but A is not. But the BPF program was > > compiled with the latest helper/kfunc definitions marking A as no_csr > > eligible (as well as B). (btw, THAT'S THE WORD for allow_csr -- > > ELIGIBLE. csr_eligible FTW! but I digress...) > > > > With the case above we'll disable nocsr for both A and B, right? That > > sucks, but not sure if we can do anything about that. (We can probably > > assume no_csr pattern and thus allow spill/fill and not disable nocsr > > in general, but not remove spill/fills... a bit more complication > > upfront for longer term extensibility.. not sure, maybe performance > > regression is a fine price, hmm) > > > > So I take it back about unmarking csr in the *entire BPF program*, > > let's just limit it to the subprog scope. But I still think we should > > do it eagerly, rather than double checking in do_misc_followups(). > > WDYT? > > With current algo this situation would disable nocsr indeed. > The problem is that checks for spilled stack slots usage is too simplistic. > However, it could be covered if the check is performed using e.g. > process I described earlier: > - for spill, remember the defining instruction in the stack slot structure; > - for fill, "merge" the defining instruction index; > - for other stack access mark defining instruction as escaping. Sorry, I have no idea what the above means and implies. "defining instruction", "merge", "escaping" As I mentioned above, I'd keep it as simple as reasonably possible. > > > > So I wanted to keep the nocsr check a little bit more permissive. > > > > > > > Also, instead of doing that extra nocsr offset check in > > > > do_misc_fixups(), why don't we just reset all no_csr patterns within a > > > > subprogram *if we find a single violation*. Compiler shouldn't ever > > > > emit such code, right? So let's be strict and fallback to not > > > > recognizing nocsr. > > > > > > > > And then we won't need that extra check in do_misc_fixups() because we > > > > eagerly unset no_csr flag and will never hit that piece of logic in > > > > patching. > > > > > > I can do that, but the detector pass would have to be two pass: > > > - on the first pass, find the nocsr_stack_off, add candidate insn marks; > > > - on the second pass, remove marks from insns with wrong stack access offset. > > > > It's not really a second pass, it's part of normal validation. > > check_nocsr_stack_contract() will detect this and will do, yes, pass > > over all instructions of a subprogram to unmark them. > > - on the first pass true .nocsr_stack_off is not yet known, > so .nocsr_pattern is set optimistically; > - on the second pass .nocsr_stack_off is already known, > so .nocsr_pattern can be removed from spills/fills outside the range; > - check_nocsr_stack_contract() does not need to scan full sub-program > on each violation, it can set a flag disabling nocsr in subprogram info. I'm not even sure anymore if we are agreeing or disagreeing, let's just look at the next revision or something, I'm getting lost. I was saying that as soon as check_nocsr_stack_contract() detects that contract is breached, we eagerly set nocsr_pattern and nocsr_spills_num to 0 for all instructions within the current subprog and setting nocsr_stack_off for subprog info to 0 (there is no more csr). There will be no other violations after that within that subprog. Maybe that's what you mean as well and I just can't read. > > > > Otherwise I can't discern true patterns from false positives in > > > situation described above. > > > > See above, I might be missing what your "theoretical" example changes. > > > > > > > > > I'd go even further and say that on any nocsr invariant violation, we > > > > just go and reset all nocsr pattern flags across entire BPF program > > > > (all subprogs including). In check_nocsr_stack_contract() I mean. Just > > > > have a loop over all instructions and set that flag to false. > > > > > > > > Why not? What realistic application would need to violate nocsr in > > > > some subprogs but not in others? > > > > > > > > KISS or it's too simplistic for some reason? > > > > > > I can invalidate nocsr per-program. > > > If so, I would use an "allow_nocsr" flag in program aux to avoid > > > additional passes over instructions. > > > > yep, see above (but let's keep it per-subprog for now) > > Ok > > > > > > > > bpf_patch_insn_data() assumes that one instruction has to be replaced with many. > > > > > > > Here I need to replace many instructions with a single instruction. > > > > > > > I'd prefer not to tweak bpf_patch_insn_data() for this patch-set. > > > > > > > > > > > > ah, bpf_patch_insn_data just does bpf_patch_insn_single, somehow I > > > > > > thought that it does range for range replacement of instructions. > > > > > > Never mind then (it's a bit sad that we don't have a proper flexible > > > > > > and powerful patching primitive, though, but oh well). > > > > > > > > > > That is true. > > > > > I'll think about it once other issues with this patch-set are resolved. > > > > > > > > yeah, it's completely unrelated, but having a bpf_patch_insns that > > > > takes input instruction range to be replaced and replacing it with > > > > another set of instructions would cover all existing use cases and > > > > some more. We wouldn't need verifier_remove_insns(), because that's > > > > just replacing existing instructions with empty new set of > > > > instructions. We would now have "insert instructions" primitive if we > > > > specify empty input range of instructions. If we add a flag whether to > > > > adjust jump offsets and make it configurable, we'd need the thing that > > > > Alexei needed for may_goto without any extra hacks. > > > > > > > > Furthermore, I find it wrong and silly that we keep having manual > > > > delta+insn adjustments in every single piece of patching logic. I > > > > think this should be done by bpf_patch_insns(). We need to have a > > > > small "insn_patcher" struct that we use during instruction patching, > > > > and a small instruction iterator on top that would keep returning next > > > > instruction to process (and its index, probably). This will allow us > > > > to optimize patching and instead of doing O(N) we can have something > > > > faster and smarter (if we hide direct insn array accesses during > > > > patching). > > > > > > > > But this is all a completely separate story from all of this. > > > > > > I actually suggested something along these lines very long time ago: > > > https://lore.kernel.org/bpf/5c297e5009bcd4f0becf59ccd97cfd82bb3a49ec.camel@xxxxxxxxx/ > > > > We had a few discussions around patching improvements (you can see I > > mentioned that in that thread as well). We all like to talk about > > improving this, but not really doing anything about this. :) Tech debt > > at its best. > > If you are not completely opposed to that idea I can follow-up with a > complete version as an RFC. let's do one thing at a time, there is plenty to review for nocsr without having to switch to patching.