Andrii Nakryiko wrote: > Hey John, Sorry missed this while I was on PTO that week. > > We've been recently experimenting with using BPF_SK_SKB_STREAM_PARSER > and BPF_SK_SKB_STREAM_VERDICT with sockmap/sockhash to perform > in-kernel parsing of RSocket frames. A very simple format ([0]) where > the first 3 bytes specify the size of the frame payload. The idea was > to collect the entire frame in the kernel before notifying user-space > that data is available. This is meant to minimize unnecessary wakeups > due to incomplete logical frames, saving CPU. Nice. > > You can find the BPF source code I've used at [1], it has lots of > extra logging and stuff, but the idea is to read the first 3 bytes of > each logical frame, and return the expected full frame size from the > parser program. The verdict program always just returns SK_PASS. > > This seems to work exactly as expected in manual simulations of > various packet size distributions, and even for a bunch of > ping/pong-like benchmark (which are very sensitive to correct frame > length determination, so I'm reasonably confident we don't screw that > up much). And yet, when benchmarking sending multiple logical RPC > streams over the same single socket (so many interleaving RSocket > frames on single socket, but in terms of logical frames nothing should > change), we often see that while full frame hasn't been accumulated in > socket receive buffer yet, epoll_wait() for that socket would return > with success notifying user space that there is data on socket. > Subsequent recvfrom() call would immediately return -EAGAIN and no > data, and our benchmark would go on this loop of useless > epoll_wait()+recvfrom() calls back to back, many times over. Aha yes this sounds bad. > > So I have a few questions: > - is the above use case something that was meant to be handled by > sockmap+parser/verdict? We shouldn't wake up user space if there is nothing to read. So yes this seems like a valid use case to me. > - is it correct to assume that epoll won't wake up until amount of > bytes requested by parser program is accumulated (this seems to be the > case from manually experimenting with various "packet delays"); Seems there is some bug that races and causes it to wake up user space. I'm aware of a couple bugs in the stream parser that I wanted to fix. Not sure I can get to them this week but should have time next week. We have a couple more fixes to resolve a few HTTPS server compliance tests as well. > - is there some known bug or race in how sockmap and strparser > framework interacts with epoll subsystem that could cause this weird > epoll_wait() behavior? Yes I know of some races in strparser. I'll elaborate later probably with patches as I don't recall them readily at the moment. > > It does seem like some sort of timing issue, but I couldn't pin down > exactly what are the conditions that this happens in. But it's quite > reproducible with a pretty high frequency using our internal benchmark > when multiple logical streams are involved. > > Any thoughts or suggestions? Seems like a bug we should fix it. I'm aware of a couple issues with the stream parser that we plan to fix so could be one of those or a new one I'm not aware of. I'll take a look more closely next week. > [0] https://rsocket.io/about/protocol/#framing-format > [1] https://github.com/anakryiko/libbpf-bootstrap/blob/thrift-coalesce-rcvlowat/examples/c/bootstrap.bpf.c > > -- Andrii