Re: [BUG] pfifo_fast may cause out-of-order CAN frame transmission

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2020-01-08 at 15:55 +0100, Ahmad Fatoum wrote:
> I've run into an issue of CAN frames being sent out-of-order on an i.MX6 Dual
> with Linux v5.5-rc5. Bisecting has lead me down to this commit:
> 
> ba27b4cdaaa ("net: dev: introduce support for sch BYPASS for lockless qdisc")
> 
> With it, using pfifo_fast, every few hundred frames, FlexCAN's .ndo_start_xmit is
> passed frames in an order different from how userspace stuffed them into the same
> socket.
> 
> Reverting it fixes the issue as does booting with maxcpus=1 or using pfifo
> instead of pfifo_fast.
> 
> According to [1], such reordering shouldn't be happening.
> 
> Details on my setup:
> Kernel version: v5.5-rc5, (occurs much more often with LOCKDEP turned on)
> CAN-Bitrate: 250 kbit/s
> CAN frames are generated with:
> cangen canX -I2 -L1 -Di -i -g0.12 -p 100
> which keeps polling after ENOBUFS until socket is writable, sends out a CAN
> frame with one incrementing payload byte and then waits 120 usec before repeating.
> 
> Please let me know if any additional info is needed.

Thank you for the report.

I think there is a possible race condition in the 'empty' flag update
schema:

CPU 0					CPU1
(running e.g. net_tx_action)		(can xmit)

qdisc_run()				__dev_xmit_skb()
pfifo_fast_dequeue				
// queue is empty, returns NULL
WRITE_ONCE(qdisc->empty, true);
					pfifo_fast_enqueue
					qdisc_run_begin() 	
					// still locked by CPU 0,
					// return false and do nothing, 
					// qdisc->empty is still true

					(next can xmit)
					// BYPASS code path
					sch_direct_xmit()
					// send pkt 2
					__qdisc_run()
					// send pkt 1

The following patch try to addresses the above, clearing 'empty' flag
in a more aggressive way. We can end-up skipping the bypass
optimization more often than strictly needed in some contended
scenarios, but I guess that is preferrable to the current issue.

The code is only build-tested, could you please try it in your setup?

Thanks,

Paolo
---
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index fceddf89592a..df460fe0773a 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -158,7 +158,6 @@ static inline bool qdisc_run_begin(struct Qdisc *qdisc)
 	if (qdisc->flags & TCQ_F_NOLOCK) {
 		if (!spin_trylock(&qdisc->seqlock))
 			return false;
-		WRITE_ONCE(qdisc->empty, false);
 	} else if (qdisc_is_running(qdisc)) {
 		return false;
 	}
diff --git a/net/core/dev.c b/net/core/dev.c
index 0ad39c87b7fd..3c46575a5af5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3625,6 +3625,8 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 			qdisc_run_end(q);
 		} else {
 			rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
+			if (rc != NET_XMIT_DROP && READ_ONCE(q->empty))
+				WRITE_ONCE(q->empty, false);
 			qdisc_run(q);
 		}
 




[Index of Archives]     [Automotive Discussions]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [CAN Bus]

  Powered by Linux