Re: Sockmap's parser/verdict programs and epoll notifications

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 11, 2023 at 7:43 AM John Fastabend <john.fastabend@xxxxxxxxx> wrote:
>
> Andrii Nakryiko wrote:
> > On Sun, Jul 16, 2023 at 9:37 PM John Fastabend <john.fastabend@xxxxxxxxx> wrote:
> > >
> > > Andrii Nakryiko wrote:
> > > > Hey John,
> > >
> > > Sorry missed this while I was on PTO that week.
> >
> > yeah, vacations tend to cause missing things :)
> >
> > >
> > > >
> > > > We've been recently experimenting with using BPF_SK_SKB_STREAM_PARSER
> > > > and BPF_SK_SKB_STREAM_VERDICT with sockmap/sockhash to perform
> > > > in-kernel parsing of RSocket frames. A very simple format ([0]) where
> > > > the first 3 bytes specify the size of the frame payload. The idea was
> > > > to collect the entire frame in the kernel before notifying user-space
> > > > that data is available. This is meant to minimize unnecessary wakeups
> > > > due to incomplete logical frames, saving CPU.
> > >
> > > Nice.
> > >
> > > >
> > > > You can find the BPF source code I've used at [1], it has lots of
> > > > extra logging and stuff, but the idea is to read the first 3 bytes of
> > > > each logical frame, and return the expected full frame size from the
> > > > parser program. The verdict program always just returns SK_PASS.
> > > >
> > > > This seems to work exactly as expected in manual simulations of
> > > > various packet size distributions, and even for a bunch of
> > > > ping/pong-like benchmark (which are very sensitive to correct frame
> > > > length determination, so I'm reasonably confident we don't screw that
> > > > up much). And yet, when benchmarking sending multiple logical RPC
> > > > streams over the same single socket (so many interleaving RSocket
> > > > frames on single socket, but in terms of logical frames nothing should
> > > > change), we often see that while full frame hasn't been accumulated in
> > > > socket receive buffer yet, epoll_wait() for that socket would return
> > > > with success notifying user space that there is data on socket.
> > > > Subsequent recvfrom() call would immediately return -EAGAIN and no
> > > > data, and our benchmark would go on this loop of useless
> > > > epoll_wait()+recvfrom() calls back to back, many times over.
> > >
> > > Aha yes this sounds bad.
> > >
> > > >
> > > > So I have a few questions:
> > > >   - is the above use case something that was meant to be handled by
> > > > sockmap+parser/verdict?
> > >
> > > We shouldn't wake up user space if there is nothing to read. So
> > > yes this seems like a valid use case to me.
> > >
> > > >   - is it correct to assume that epoll won't wake up until amount of
> > > > bytes requested by parser program is accumulated (this seems to be the
> > > > case from manually experimenting with various "packet delays");
> > >
> > > Seems there is some bug that races and causes it to wake up
> > > user space. I'm aware of a couple bugs in the stream parser
> > > that I wanted to fix. Not sure I can get to them this week
> > > but should have time next week. We have a couple more fixes
> > > to resolve a few HTTPS server compliance tests as well.
> > >
> > > >   - is there some known bug or race in how sockmap and strparser
> > > > framework interacts with epoll subsystem that could cause this weird
> > > > epoll_wait() behavior?
> > >
> > > Yes I know of some races in strparser. I'll elaborate later
> > > probably with patches as I don't recall them readily at the
> > > moment.
> >
> > So I missed a good chunk of BPF mailing list traffic while I was on my
> > PTO. Did you end up getting to these bugs in strparser logic? Should I
> > try running the latest bpf-next/net-next on our production workload to
> > see if this is still happening?
>
> You will likely still hit there error I haven't got it out of my queue
> yet. I just knocked off a couple things last week so could probably
> take a look at flushing my queue this week. Then it would make sense
> to retest to see if its something new or not.
>
> I'll at least send an RFC with the idea even if I don't get to testing
> it yet.

Sounds good, thanks a lot!

>
> Thanks,
> John





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux