Andrii Nakryiko wrote: > On Mon, Jun 22, 2020 at 12:42 PM John Fastabend > <john.fastabend@xxxxxxxxx> wrote: > > > > Andrii Nakryiko wrote: > > > On Mon, Jun 22, 2020 at 11:30 AM John Fastabend > > > <john.fastabend@xxxxxxxxx> wrote: > > > > > > > > Andrii Nakryiko wrote: > > > > > On Fri, Jun 19, 2020 at 3:21 PM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote: > > > > > > > > > > > > On 6/19/20 8:41 PM, Andrii Nakryiko wrote: > > > > > > > On Fri, Jun 19, 2020 at 6:08 AM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote: > > > > > > >> On 6/19/20 2:39 AM, John Fastabend wrote: > > > > > > >>> John Fastabend wrote: > > > > > > >>>> Andrii Nakryiko wrote: > > > > > > >>>>> On Thu, Jun 18, 2020 at 11:58 AM John Fastabend > > > > > > >>>>> <john.fastabend@xxxxxxxxx> wrote: > > > > > > >>> > > > > > > >>> [...] > > > > > > >>> > > > > > > >>>>> That would be great. Self-tests do work, but having more testing with > > > > > > >>>>> real-world application would certainly help as well. > > > > > > >>>> > > > > > > >>>> Thanks for all the follow up. > > > > > > >>>> > > > > > > >>>> I ran the change through some CI on my side and it passed so I can > > > > > > >>>> complain about a few shifts here and there or just update my code or > > > > > > >>>> just not change the return types on my side but I'm convinced its OK > > > > > > >>>> in most cases and helps in some so... > > > > > > >>>> > > > > > > >>>> Acked-by: John Fastabend <john.fastabend@xxxxxxxxx> > > > > > > >>> > > > > > > >>> I'll follow this up with a few more selftests to capture a couple of our > > > > > > >>> patterns. These changes are subtle and I worry a bit that additional > > > > > > >>> <<,s>> pattern could have the potential to break something. > > > > > > >>> > > > > > > >>> Another one we didn't discuss that I found in our code base is feeding > > > > > > >>> the output of a probe_* helper back into the size field (after some > > > > > > >>> alu ops) of subsequent probe_* call. Unfortunately, the tests I ran > > > > > > >>> today didn't cover that case. > > > > > > >>> > > > > > > >>> I'll put it on the list tomorrow and encode these in selftests. I'll > > > > > > >>> let the mainainers decide if they want to wait for those or not. > > > > > > >> > > > > > > >> Given potential fragility on verifier side, my preference would be that we > > > > > > >> have the known variations all covered in selftests before moving forward in > > > > > > >> order to make sure they don't break in any way. Back in [0] I've seen mostly > > > > > > >> similar cases in the way John mentioned in other projects, iirc, sysdig was > > > > > > >> another one. If both of you could hack up the remaining cases we need to > > > > > > >> cover and then submit a combined series, that would be great. I don't think > > > > > > >> we need to rush this optimization w/o necessary selftests. > > > > > > > > > > > > > > There is no rush, but there is also no reason to delay it. I'd rather > > > > > > > land it early in the libbpf release cycle and let people try it in > > > > > > > their prod environments, for those concerned about such code patterns. > > > > > > > > > > > > Andrii, define 'delay'. John mentioned above to put together few more > > > > > > selftests today so that there is better coverage at least, why is that > > > > > > an 'issue'? I'm not sure how you read 'late in release cycle' out of it, > > > > > > it's still as early. The unsigned optimization for len <= MAX_LEN is > > > > > > reasonable and makes sense, but it's still one [specific] variant only. > > > > > > > > > > I'm totally fine waiting for John's tests, but I read your reply as a > > > > > request to go dig up some more examples from sysdig and other > > > > > projects, which I don't think I can commit to. So if it's just about > > > > > waiting for John's examples, that's fine and sorry for > > > > > misunderstanding. > > > > > > > > > > > > > > > > > > I don't have a list of all the patterns that we might need to test. > > > > > > > Going through all open-source BPF source code to identify possible > > > > > > > patterns and then coding them up in minimal selftests is a bit too > > > > > > > much for me, honestly. > > > > > > > > > > > > I think we're probably talking past each other. John wrote above: > > > > > > > > > > Yep, sorry, I assumed more general context, not specifically John's reply. > > > > > > > > > > > > > > > > > >>> I'll follow this up with a few more selftests to capture a couple of our > > > > > > >>> patterns. These changes are subtle and I worry a bit that additional > > > > > > >>> <<,s>> pattern could have the potential to break something. > > > > > > > > > > > > So submitting this as a full series together makes absolutely sense to me, > > > > > > so there's maybe not perfect but certainly more confidence that also other > > > > > > patterns where the shifts optimized out in one case are then appearing in > > > > > > another are tested on a best effort and run our kselftest suite. > > > > > > > > > > > > Thanks, > > > > > > Daniel > > > > > > > > Hi Andrii, > > > > > > > > How about adding this on-top of your selftests patch? It will cover the > > > > cases we have now with 'len < 0' check vs 'len > MAX'. I had another case > > > > where we feed the out 'len' back into other probes but this requires more > > > > hackery than I'm willing to encode in a selftests. There seems to be > > > > some better options to improve clang side + verifier and get a clean > > > > working version in the future. > > > > > > Ok, sounds good. I'll add it as an extra patch. Not sure about all the > > > conventions with preserving Signed-off-by. Would just keeping your > > > Signed-off-by be ok? If you don't mind, though, I'll keep each > > > handler{32,64}_{gt,lt} as 4 independent BPF programs, so that if any > > > of them is unverifiable, it's easier to inspect the BPF assembly. Yell > > > if you don't like this. > > > > works for me, go for it. > > > > > > > > > > > > > On the clang/verifier side though I think the root cause is we do a poor > > > > job of tracking >>32, s<<32 case. How about we add a sign-extend instruction > > > > to BPF? Then clang can emit BPF_SEXT_32_64 and verifier can correctly > > > > account for it and finally backends can generate better code. This > > > > will help here, but also any other place we hit the sext codegen. > > > > > > > > Alexei, Yonghong, any opinions for/against adding new insn? I think we > > > > talked about doing it earlier. > > > > > > Seems like an overkill to me, honestly. I'd rather spend effort on > > > teaching Clang to always generate `w1 = w0` for such a case (for > > > alu32). For no-ALU32 recommendation would be to switch to ALU32, if > > > you want to work with int instead of long and care about two bitshift > > > operations. If you can stick to longs on no-ALU32, then no harm, no > > > foul. > > > > > > > Do you have an example of where clang doesn't generate just `w1 = w0` > > for the alu32 case? It really should at this point I'm not aware of > > any cases where it doesn't. I think you might have mentioned this > > earlier but I'm not seeing it. > > Yeah, ALU32 + LONG for helpers + u32 for len variable. I actually call > this out explicitly in the commit message for this patch. > Maybe we are just saying the same thing but the <<32, s>>32 pattern from the ALU32 + LONG for helpers + u32 is becuase llvm generated a LLVM IR sext instruction. We need the sext because its promoting a u32 type to a long. We can't just replace those with MOV instructions like we do with zext giving the `w1=w0`. We would have to "know" the helper call zero'd the upper bits but this isn't C standard. My suggestion to fix this is to generate a BPF_SEXT and then let the verifier handle it and JITs generate good code for it. On x86 we have a sign-extending move MOVSX for example. Trying to go the other way and enforce callees zero upper bits of return register seems inconsistent and more difficult to implement. > > > > There are other cases where sext gets generated in normal code and > > it would be nice to not always have to work around it.