On Tue, Sep 29, 2020 at 10:52:48AM -0400, Ion Badulescu wrote: > Hello, > > I ran into some packet reordering using a plain vanilla 5.4.49 kernel and > the Amazon AWS ena driver. The external symptom was that every now and > again, one or more larger packets would be delivered to the UDP socket after > some smaller packets, even though the smaller packets were sent after the > larger packets. They were also delivered late to a packet socket sniffer, > which initially made me suspect an RSS bug in the ena hardware. Eventually, > I modified the ena driver to stamp each packet (by overwriting its ethernet > source mac field) with the id of the RSS queue that delivered it, and with > the value of a per-RSS-queue counter that incremented for each packet > received in that queue. That hack showed RSS functioning properly, and also > showed packets received earlier (with a smaller counter value) being > delivered after packets with a larger counter value. It established that the > reordering was in fact happening inside the kernel network core. > > The breakthrough came from realizing that the ena driver defaults its > rx_copybreak to 256, which matched perfectly the boundary between the > delayed large packets and the smaller packets being delivered first. After > changing ena's rx_copybreak to zero, the reordering issue disappeared. > > After a lot of hair pulling, I think I figured out what the issue is -- and > it's confined to the 5.4 stable tree. Commit 323ebb61e32b4 (present in 5.4) > introduced queueing for GRO_NORMAL packets received via napi_gro_frags() -> > napi_frags_finish(). Commit 6570bc79c0df (NOT present in 5.4) extended the > same GRO_NORMAL queueing to packets received via napi_gro_receive() -> > napi_skb_finish(). Without 6570bc79c0df, packets received via > napi_gro_receive() can get delivered ahead of the earlier, queued up packets > received via napi_gro_frags(). And this is precisely what happens in the ena > driver with packets smaller than rx_copybreak vs packets larger than > rx_copybreak. > > Interestingly, the 5.4 stable tree does contain a backport of the upstream > c80794323e commit, which purports to fix issues introduced by 323ebb61e32b4 > and 6570bc79c0df. But 6570bc79c0df itself is missing... > > I don't yet have a 5.4-stable patch, since I wanted to first raise the issue > publicly and confirm I'm not missing something obvious. The patch would > probably involve massaging 6570bc79c0df quite a bit, to avoid colliding with > the already-present c80794323e fix. You might want to also cc: the networking mailing list for this type of stuff :) thanks, greg k-h