Hi Eric, On Wed, 2016-09-21 at 16:31 -0700, Eric Dumazet wrote: > Also does inet_diag properly give the forward_alloc to user ? > > $ ss -mua > State Recv-Q Send-Q Local Address:Port Peer Addres > s:Port > UNCONN 51584 0 *:52460 *:* > skmem:(r51584,rb327680,t0,tb327680,f1664,w0,o0,bl0,d575) Thank you very much for reviewing this! My bad, there is still a race which leads to temporary negative values of fwd. I feel the fix is trivial but it needs some investigation. > Couldn't we instead use an union of an atomic_t and int for > sk->sk_forward_alloc ? That was our first attempt, but we had some issue on mem scheduling; if we use: if (atomic_sub_return(size, &sk->sk_forward_alloc_atomic) < 0) { // fwd alloc } that leads to inescapable, temporary, negative value for sk->sk_forward_alloc. Another option would be: again: fwd = atomic_read(&sk->sk_forward_alloc_atomic); if (fwd > size) { if (atomic_cmpxchg(&sk->sk_forward_alloc_atomic, fwd, fwd - size) != fwd) goto again; } else // fwd alloc which would be bad under high contention. I think that using a single atomic to track both scheduling and reclaim requires a similar loop to resolve reclaim contention. They should be very rare, especially if we can avoid reclaiming on dequeue, but still will complicate the code. Finally, pahole shows that the number of cacheline used by an udp socket on x86_64 does not change adding the two new atomic fields. > All udp queues/dequeues would manipulate the atomic_t using regular > atomic ops and use a special skb destructor (instead of sock_rfree()) I tried the special skb destructor and discarded it, but thinking again, now I feel it can simplify the code, regardless of the used schema. We will still need to replace the current in-kernel skb_free_datagram_locked() for udp with something else, so a bit of noise in the sunrpc subsystem will remain. > Also I would not bother 'reclaiming' forward_alloc at dequeue, unless > udp is under memory pressure. We discarded that option because it would change significantly the system behavior, but if there is agreement on it, I think that it would make a really relevant (positive) difference! > Please share your performance numbers, thanks ! Tested using pktgen with random src port, 64 bytes packet, wire-speed on an 10G link as sender and udp_sink by Jesper (https://github.com/netoptimizer/network-testing/blob/master/src/udp_sink.c) as receiver, using l4 tuple rxhash to stress the contention, and one or more udp_sink instance with reuseport. nr readers Kpps (vanilla) Kpps (patched) 1 241 600 3 1800 2200 6 3450 3950 9 5100 5400 12 6400 6650 Thank you for the very nice suggestions! Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html