On 06/27/2018 06:34 AM, Flavio Leitner wrote: > The sock reference is lost when scrubbing the packet and that breaks > TSQ (TCP Small Queues) and XPS (Transmit Packet Steering) causing > performance impacts of about 50% in a single TCP stream when crossing > network namespaces. > > XPS breaks because the queue mapping stored in the socket is not > available, so another random queue might be selected when the stack > needs to transmit something like a TCP ACK, or TCP Retransmissions. > That causes packet re-ordering and/or performance issues. Note we do not care if another random queue is selected when TCP retransmit after timeout happens. The problem is really when sending a normal train of packets (being retransmission or not). We want all of them going through one queue to avoid reorders. After an idle period (no packets are in any qdisc/NIC queue), we absolutely are okay to select another "random queue". This choice is driven by skb->ooo_okay Most TCP ACK packets are sent while no prior packet is in qdisc, so should have ooo_okay set to 1. > > TSQ breaks because it orphans the packet while it is still in the > host, so packets are queued contributing to the buffer bloat problem. > > Preserving the sock reference fixes both issues. The socket is > orphaned anyways in the receiving path before any relevant action > and on TX side the netfilter checks if the reference is local before > use it. > > Signed-off-by: Flavio Leitner <fbl@xxxxxxxxxx> > --- > Documentation/networking/ip-sysctl.txt | 10 +++++----- > net/core/skbuff.c | 1 - > 2 files changed, 5 insertions(+), 6 deletions(-) > > diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt > index ce8fbf5aa63c..f4c042be0216 100644 > --- a/Documentation/networking/ip-sysctl.txt > +++ b/Documentation/networking/ip-sysctl.txt > @@ -733,11 +733,11 @@ tcp_limit_output_bytes - INTEGER > Controls TCP Small Queue limit per tcp socket. > TCP bulk sender tends to increase packets in flight until it > gets losses notifications. With SNDBUF autotuning, this can > - result in a large amount of packets queued in qdisc/device > - on the local machine, hurting latency of other flows, for > - typical pfifo_fast qdiscs. > - tcp_limit_output_bytes limits the number of bytes on qdisc > - or device to reduce artificial RTT/cwnd and reduce bufferbloat. > + result in a large amount of packets queued on the local machine > + (e.g.: qdiscs, CPU backlog, or device) hurting latency of other > + flows, for typical pfifo_fast qdiscs. tcp_limit_output_bytes > + limits the number of bytes on qdisc or device to reduce artificial > + RTT/cwnd and reduce bufferbloat. > Default: 262144 > > tcp_challenge_ack_limit - INTEGER > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index b1f274f22d85..f59e98ca72c5 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -4911,7 +4911,6 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet) > return; > > ipvs_reset(skb); > - skb_orphan(skb); > skb->mark = 0; > } > EXPORT_SYMBOL_GPL(skb_scrub_packet); > -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html