Re: SCTP memory pressure

Vlad Yasevich <vladislav.yasevich@xxxxxx> · Thu, 10 Jul 2008 09:05:00 -0400

Neil Horman wrote:
On Wed, Jul 09, 2008 at 04:24:47PM -0400, Vlad Yasevich wrote:
Frank.Uretschlaeger@xxxxxxxxxxx wrote:
Sorry,
forgot to mention, the last test was done with Neils latest private SCTP
branch, which was merged with Vlads latest re-transmission fixes.
Best regards,
Frank


Hi Frank

I've been taking a very hard look at the current accounting code and we
definitely don't do thinks correctly...  In fact we allow for so much memory
to be consumed that it's not surprising that we run into this problem.

Currently, if you do not set the SO_RCVBUF, we will allow you to consume
sk_rcvbuf + sctp_mem[1] memory which is just a total overkill.  If you
have a lot of sockets, which I think you do based on what I remember,
each socket can consume it's full receive buffer without triggering
any memory pressure.  Only once the all of the receive buffer is consumed,
will we start counting memory allocations towards the memory pressure 
condition.


This isn't quite correct, we use the sk_wmem_schedule api to do our memory
accounting, which accounts memory on a semi global basis.  The sctp_mem array is
compared to the total amount of allocated memory for all sctp sockets (see
__sk_mem_schedule, where all this math happens).  So we enter memory pressure as
soon as all the sctp sockets in aggregate allocate more than sysctl_sctp_mem[1]
bytes.

Well,  yes what I said isn't quite correct for the case of a ton of sockets, but
it is still correct for as single socket case.

Looking at what's currently in the mainline, we do not call sk_rmem_schedule()
until after we've exceeded our receive buffer.

        if (rx_count >= asoc->base.sk->sk_rcvbuf) {

                if ((asoc->base.sk->sk_userlocks & SOCK_RCVBUF_LOCK) ||
                    (!sk_rmem_schedule(asoc->base.sk, chunk->skb->truesize)))
                        goto fail;
        }

This means, that while we are below sk_recbuf, we do not account our memory
allocations against prot->memory_allocated (which is the total allocations 
against SCTP memory pool).  We only start counting once we exceed the default
receive buffer.

So, for a single socket case, we can allocate sk_recvbuf + sctp_mem[1] and
only then do we enter memory pressure.

For the case of multiple socket, every socket is allowed to fully consume
there specified sk_rcvbuf and none of those allocations count against
prot->memory_allocated.

So, in Frank's case, he could conceivably create 10000 sockets and would 
allocate 10000 * rmem_default without triggering memory pressure conditions
because we would never call sk_rmem_schedule.


That being said, I agree that we're clearly using too much memory in Franks
environment.  The fixes in my branch that Frank has been using addresses several
points at which we fail to use the sk_*_schedule api.  I'm sure there are more
that I'm missing, and additionally there are other things that we can likely do
to conserve memory when we enter pressure.

I am trying to see what we can do.  Still need to find time to look at 
Neil's changes.

Looking forward to your thoughts.  Not sure if its ready for inclusion yet, but
I think at least some bits are correct.  Let me know what you think!


Right now, I've changed the above quoted code to do this:

--- a/net/sctp/ulpevent.c
+++ b/net/sctp/ulpevent.c
@@ -687,6 +687,7 @@ struct sctp_ulpevent *sctp_ulpevent_make_rcvmsg(struct 
sctp_association *asoc,
        struct sk_buff *skb;
        size_t padding, len;
        int rx_count;
+       int have_mem;

        /*
         * check to see if we need to make space for this
@@ -698,10 +699,15 @@ struct sctp_ulpevent *sctp_ulpevent_make_rcvmsg(struct 
sctp_association *asoc,
        else
                rx_count = atomic_read(&asoc->base.sk->sk_rmem_alloc);

-       if (rx_count >= asoc->base.sk->sk_rcvbuf) {
-
-               if ((asoc->base.sk->sk_userlocks & SOCK_RCVBUF_LOCK) ||
-                   (!sk_rmem_schedule(asoc->base.sk, chunk->skb->truesize)))
+       have_mem = sk_rmem_schedule(asoc->base.sk, chunk->skb->truesize);
+       if (rx_count >= asoc->base.sk->sk_rcvbuf || !have_mem) {
+               /* if we are still allowed to make allocations
+                * see if we can adjust the receive buffer.
+                */
+               if (have_mem) {
+                       if (!sctp_recvbuf_adjust())
+                               goto fail;
+               } else
                        goto fail;
        }

It's still a work in progress...

The idea is that if sk_rmem_schedule tells us that we still have memory to 
allocated, go ahead and allow allocations.  However, if sk_rmem_shedule tells
us no, even if we have not consumed all of our receive buffer, we still can't
allocate requested memory.

Other pieces of code that would have to go in are receive buffer adjustments
(similar to tcp_rcv_space_adjust) as well as a_rwnd manipulations to make
sure we close the window appropriately.

-vlad
--
To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html