On Wed, Jul 10, 2024 at 12:03 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > On Wed, 2024-07-10 at 11:49 -0700, Andrii Nakryiko wrote: > > On Wed, Jul 10, 2024 at 11:41 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > > > > > On Wed, 2024-07-10 at 10:50 -0700, Andrii Nakryiko wrote: > > > > > > [...] > > > > > > > > > Ok, I see, this wasn't obvious that you want this behavior. I actually > > > > > > find it surprising (and so at least let's call this possibility out). > > > > > > > > > > > > But, tbh, I'd make this stricter. I'd dictate that within a subprogram > > > > > > all no_csr patterns *should* end with the same stack offset and I > > > > > > would check for that (so what I mentioned before that instead of min > > > > > > or max we need assignment and check for equality if we already > > > > > > assigned it once). > > > > > > > > > > This makes sense, but does not cover a theoretical corner case: > > > > > - suppose there are two nocsr functions A and B on the kernel side, > > > > > but only was A marked as nocsr during program compilation; > > > > > - the spill/fill "bracket" was generated for a call to function B by > > > > > the compiler (because this is just a valid codegen). > > > > > > > > In this case the call to A shouldn't be changing nocsr_offset at all, > > > > though? You should find no spill/fill and thus even though A is > > > > *allowed* to use no_csr, it doesn't, so it should have no effect on > > > > nocsr offsets, no? > > > > > > Consider the following code: > > > > > > *(u64 *)(r10 - 24) = r1; > > > call A // kernel and clang know that A is nocsr > > > r1 = *(u64 *)(r10 - 24); // kernel assumes .nocsr_stack_offset -24 > > > ... > > > *(u64 *)(r10 - 16) = r1; > > > call B // only kernel knows that B is nocsr > > > r1 = *(u64 *)(r10 - 16); // with stricter logic this would disable > > > // nocsr for the whole sub-probram > > > > oh, you mean that r10-16 accesses are valid instructions emitted by > > the compiler but not due to nocsr? I mean, tough luck?... We shouldn't > > reject this, but nocsr is ultimately a performance optimization, so > > not critical if it doesn't work within some subprogram. > > Not critical, but the difference between allowing and disallowing nocsr > for this case is '<' (current) vs '==' (suggested) check for .nocsr_stack_offset. > I think that current algo should not be made more strict. > ok > > > > But your example actually made me think about another (not theoretical > > > > at all) case. What if we have calls to A and B, the kernel is slightly > > > > old and knows that B is nocsr, but A is not. But the BPF program was > > > > compiled with the latest helper/kfunc definitions marking A as no_csr > > > > eligible (as well as B). (btw, THAT'S THE WORD for allow_csr -- > > > > ELIGIBLE. csr_eligible FTW! but I digress...) > > > > > > > > With the case above we'll disable nocsr for both A and B, right? That > > > > sucks, but not sure if we can do anything about that. (We can probably > > > > assume no_csr pattern and thus allow spill/fill and not disable nocsr > > > > in general, but not remove spill/fills... a bit more complication > > > > upfront for longer term extensibility.. not sure, maybe performance > > > > regression is a fine price, hmm) > > > > > > > > So I take it back about unmarking csr in the *entire BPF program*, > > > > let's just limit it to the subprog scope. But I still think we should > > > > do it eagerly, rather than double checking in do_misc_followups(). > > > > WDYT? > > > > > > With current algo this situation would disable nocsr indeed. > > > The problem is that checks for spilled stack slots usage is too simplistic. > > > However, it could be covered if the check is performed using e.g. > > > process I described earlier: > > > - for spill, remember the defining instruction in the stack slot structure; > > > - for fill, "merge" the defining instruction index; > > > - for other stack access mark defining instruction as escaping. > > > > Sorry, I have no idea what the above means and implies. "defining > > instruction", "merge", "escaping" > > > > As I mentioned above, I'd keep it as simple as reasonably possible. > > It should not be much more complex compared to current implementation. > > 1: *(u64 *)(r10 - 16) = r1; // for stack slot -16 remember that > // it is defined at insn (1) > 2: call %[nocsr_func] > 3: r1 = *(u64 *)(r10 - 16); // the value read from stack is defined > // at (1), so remember this in insn aux > > If (1) is the only defining instruction for (3) and value written at (1) > is not used by other instructions (e.g. not passed as function argument, > "escapes"), the pair (1) and (3) could be removed. > > As I mentioned above, I'd keep it as simple as reasonably possible. > > > > > > > > - on the first pass true .nocsr_stack_off is not yet known, > > > so .nocsr_pattern is set optimistically; > > > - on the second pass .nocsr_stack_off is already known, > > > so .nocsr_pattern can be removed from spills/fills outside the range; > > > - check_nocsr_stack_contract() does not need to scan full sub-program > > > on each violation, it can set a flag disabling nocsr in subprogram info. > > > > I'm not even sure anymore if we are agreeing or disagreeing, let's > > just look at the next revision or something, I'm getting lost. > > > > I was saying that as soon as check_nocsr_stack_contract() detects that > > contract is breached, we eagerly set nocsr_pattern and > > nocsr_spills_num to 0 for all instructions within the current subprog > > and setting nocsr_stack_off for subprog info to 0 (there is no more > > csr). There will be no other violations after that within that > > subprog. Maybe that's what you mean as well and I just can't read. > > Ok, I'll send v3, let's proceed from there. > > [...]