On 2021/3/23 14:37, Ahmad Fatoum wrote: > Hi, > > On 22.03.21 10:09, Yunsheng Lin wrote: >> Currently pfifo_fast has both TCQ_F_CAN_BYPASS and TCQ_F_NOLOCK >> flag set, but queue discipline by-pass does not work for lockless >> qdisc because skb is always enqueued to qdisc even when the qdisc >> is empty, see __dev_xmit_skb(). >> >> This patch calls sch_direct_xmit() to transmit the skb directly >> to the driver for empty lockless qdisc too, which aviod enqueuing >> and dequeuing operation. qdisc->empty is set to false whenever a >> skb is enqueued, see pfifo_fast_enqueue(), and is set to true when >> skb dequeuing return NULL, see pfifo_fast_dequeue(). >> >> There is a data race between enqueue/dequeue and qdisc->empty >> setting, qdisc->empty is only used as a hint, so we need to call >> sch_may_need_requeuing() to see if the queue is really empty and if >> there is requeued skb, which has higher priority than the current >> skb. >> >> The performance for ip_forward test increases about 10% with this >> patch. >> >> Signed-off-by: Yunsheng Lin <linyunsheng@xxxxxxxxxx> >> --- >> Hi, Vladimir and Ahmad >> Please give it a test to see if there is any out of order >> packet for this patch, which has removed the priv->lock added in >> RFC v2. > > Overnight test (10h, 64 mil frames) didn't see any out-of-order frames > between 2 FlexCANs on a dual core machine: > > Tested-by: Ahmad Fatoum <a.fatoum@xxxxxxxxxxxxxx> > > No performance measurements taken. Thanks for the testing. And I has done the performance measurement. L3 forward testing improves from 1.09Mpps to 1.21Mpps, still about 10% improvement. pktgen + dummy netdev: threads without+this_patch with+this_patch delta 1 2.56Mpps 3.11Mpps +21% 2 3.76Mpps 4.31Mpps +14% 4 5.51Mpps 5.53Mpps +0.3% 8 2.81Mpps 2.72Mpps -3% 16 2.24Mpps 2.22Mpps -0.8% > >>