Re: [Patch bpf v3] skmsg: check sk_rcvbuf limit before queuing to ingress_skb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Cong Wang wrote:
> On Wed, Oct 13, 2021 at 7:07 AM John Fastabend <john.fastabend@xxxxxxxxx> wrote:
> >
> > Cong Wang wrote:
> > > From: Cong Wang <cong.wang@xxxxxxxxxxxxx>
> > >
> > > Jiang observed OOM frequently when testing our AF_UNIX/UDP
> > > proxy. This is due to the fact that we do not actually limit
> > > the socket memory before queueing skb to ingress_skb. We
> > > charge the skb memory later when handling the psock backlog,
> > > and it is not limited either.

[...]

> > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > > index e8b48df73c85..8b243fcdbb8f 100644
> > > --- a/net/ipv4/tcp.c
> > > +++ b/net/ipv4/tcp.c
> > > @@ -1665,6 +1665,8 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
> > >                       if (used <= 0) {
> > >                               if (!copied)
> > >                                       copied = used;
> > > +                             if (used == -EAGAIN)
> > > +                                     continue;
> >
> > This is not a good idea, looping through read_sock because we have
> > hit a memory limit is not going to work. If something is holding the
> > memlimit pinned this could loop indefinately.
> >
> > Also this will run the verdict multiple times on the same bytes. For
> > apply/cork logic this will break plus just basic parsers will be
> > confused when they see duplicate bytes.
> 
> Good point! I run out of ideas for dealing with this TCP case,
> dropping is not okay, retrying is hard, reworking TCP ACKing
> is even harder. :-/

I think it can be done with a retry queue in skmsg side. I'll give
it a try today/tomorrow.

> 
> >
> > We need to do a workqueue and then retry later.
> >
> > Final missing piece is that strparser logic would still not handle
> > this correctly.
> >
> > I don't mind spending some time on this today. I'll apply your
> > patch and then add a few fixes for above.
> 
> Ideally, we should move TCP ACK after ->sk_data_ready()
> so that dropping in ->sk_data_ready() would be fine, but this is
> certainly not easy even if it is doable.

iirc the original hook did this but there was concern from TCP
maintainers. So we decided to put hooks on top of TCP vs inside
TCP. Its also helpful for TLS hooks.

> 
> Thanks.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux