Network packet reordering with the 5.4 stable tree
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
- Subject: Network packet reordering with the 5.4 stable tree
- From: Ion Badulescu <ionut@xxxxxxxxxx>
- Date: Tue, 29 Sep 2020 10:52:48 -0400
- User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1
Hello,
I ran into some packet reordering using a plain vanilla 5.4.49 kernel
and the Amazon AWS ena driver. The external symptom was that every now
and again, one or more larger packets would be delivered to the UDP
socket after some smaller packets, even though the smaller packets were
sent after the larger packets. They were also delivered late to a packet
socket sniffer, which initially made me suspect an RSS bug in the ena
hardware. Eventually, I modified the ena driver to stamp each packet (by
overwriting its ethernet source mac field) with the id of the RSS queue
that delivered it, and with the value of a per-RSS-queue counter that
incremented for each packet received in that queue. That hack showed RSS
functioning properly, and also showed packets received earlier (with a
smaller counter value) being delivered after packets with a larger
counter value. It established that the reordering was in fact happening
inside the kernel network core.
The breakthrough came from realizing that the ena driver defaults its
rx_copybreak to 256, which matched perfectly the boundary between the
delayed large packets and the smaller packets being delivered first.
After changing ena's rx_copybreak to zero, the reordering issue disappeared.
After a lot of hair pulling, I think I figured out what the issue is --
and it's confined to the 5.4 stable tree. Commit 323ebb61e32b4 (present
in 5.4) introduced queueing for GRO_NORMAL packets received via
napi_gro_frags() -> napi_frags_finish(). Commit 6570bc79c0df (NOT
present in 5.4) extended the same GRO_NORMAL queueing to packets
received via napi_gro_receive() -> napi_skb_finish(). Without
6570bc79c0df, packets received via napi_gro_receive() can get delivered
ahead of the earlier, queued up packets received via napi_gro_frags().
And this is precisely what happens in the ena driver with packets
smaller than rx_copybreak vs packets larger than rx_copybreak.
Interestingly, the 5.4 stable tree does contain a backport of the
upstream c80794323e commit, which purports to fix issues introduced by
323ebb61e32b4 and 6570bc79c0df. But 6570bc79c0df itself is missing...
I don't yet have a 5.4-stable patch, since I wanted to first raise the
issue publicly and confirm I'm not missing something obvious. The patch
would probably involve massaging 6570bc79c0df quite a bit, to avoid
colliding with the already-present c80794323e fix.
Thanks,
-Ion
[Index of Archives]
[Linux Kernel]
[Kernel Development Newbies]
[Linux USB Devel]
[Video for Linux]
[Linux Audio Users]
[Yosemite Hiking]
[Linux Kernel]
[Linux SCSI]