Re: tc packet drop in high priority queue

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Mon, 1 Jun 2015 13:17:50 -0600

I wondered about the queue depth of sfq as well. I changed out the
prio and sfq to fq_codel. The problem persisted, although it looks
like we found another issue.

Performance on the target host just tanked and never recovered. We
found ksoftirqd eating tons of CPU time. From what I understand
(please help my understanding if I'm wrong) the kernel stopped
handling interrupts in hardware and started handling them all in the
kernel dropping performance. A reboot of the host restored performance
and lowered ksoftirqd utilization although it is still no where the 40
Gb of line rate of the adapter or even a quarter that. I think the
i40e driver still has a way to go before good performance can be
expected.

I was looking for some good info on sch_fq last week, but couldn't
find anything. Do you have links?

Thanks,
----------------
Robert LeBlanc
GPG Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Mon, Jun 1, 2015 at 12:12 PM, Dave Taht <dave.taht@xxxxxxxxx> wrote:
> On Mon, Jun 1, 2015 at 10:06 AM, jsullivan@xxxxxxxxxxxxxxxxxxx
> <jsullivan@xxxxxxxxxxxxxxxxxxx> wrote:
>>
>>> On June 1, 2015 at 12:36 PM Andy Furniss <adf.lists@xxxxxxxxx> wrote:
>>>
>>>
>>> Robert LeBlanc wrote:
>>> > -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
>>> >
>>> > Any ideas on this?
>>>
>>> 40 gig nics are way beyond anything I've ever done with tc and I guess
>>> involve some offload = huge "packets".
>>>
>>> It could be that aas sfq has a default qlen of 128 and as you are not
>>> actually rate limiting (and to do that may be "interesting" at 80 gig)
>>> the prio relies on some downstream buffer being full. Perhaps it's just
>>> that at these rates prio can not dequeue anything for periods of time so
>>> the 128 limit of sfq is overrun even for the highest prio.
>>>
>>> This is pure guesswork.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe lartc" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> Alas, I haven't yet read the details of the original post but I know we just
>> replaced all our sfq leaves with fq_codel because of concerns about the per flow
>> packet depth on high speed, high latency networks.
>
> Yes, older versions of SFQ had a hard limit on queue depth. This was
> "improved" in linux 3.6 and later but that work  ultimately pointed at
> a need to actively manage the queue depth, which begat sfqred, and
> ultimately fq_codel.
>
> I note that, these days, the best results we get for
> tcp-heavy-*servers and hosts* (not routers, not udp heavy services,
> and the bare metal under a vm is a "router" in this context), in the
> data center, at these speeds, now come from sch_fq (from the pacing,
> fq, and tso fixes), and a low setting for tcp_limit_output_bytes.
>
> example:
>
> https://fasterdata.es.net/host-tuning/linux/fair-queuing-scheduler/
>
> fq_codel remains a great all-around choice, but what pacing is doing
> is really remarkable for servers in sch_fq.
>
>> --
>> To unsubscribe from this list: send the line "unsubscribe lartc" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Dave Täht
> What will it take to vastly improve wifi for everyone?
> https://plus.google.com/u/0/explore/makewififast
--
To unsubscribe from this list: send the line "unsubscribe lartc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tc packet drop in high priority queue

Linux Advanced Routing and Traffic Control