Re: Network packet reordering with the 5.4 stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 29, 2020 at 10:52:48AM -0400, Ion Badulescu wrote:
> Hello,
> 
> I ran into some packet reordering using a plain vanilla 5.4.49 kernel and
> the Amazon AWS ena driver. The external symptom was that every now and
> again, one or more larger packets would be delivered to the UDP socket after
> some smaller packets, even though the smaller packets were sent after the
> larger packets. They were also delivered late to a packet socket sniffer,
> which initially made me suspect an RSS bug in the ena hardware. Eventually,
> I modified the ena driver to stamp each packet (by overwriting its ethernet
> source mac field) with the id of the RSS queue that delivered it, and with
> the value of a per-RSS-queue counter that incremented for each packet
> received in that queue. That hack showed RSS functioning properly, and also
> showed packets received earlier (with a smaller counter value) being
> delivered after packets with a larger counter value. It established that the
> reordering was in fact happening inside the kernel network core.
> 
> The breakthrough came from realizing that the ena driver defaults its
> rx_copybreak to 256, which matched perfectly the boundary between the
> delayed large packets and the smaller packets being delivered first. After
> changing ena's rx_copybreak to zero, the reordering issue disappeared.
> 
> After a lot of hair pulling, I think I figured out what the issue is -- and
> it's confined to the 5.4 stable tree. Commit 323ebb61e32b4 (present in 5.4)
> introduced queueing for GRO_NORMAL packets received via napi_gro_frags() ->
> napi_frags_finish(). Commit 6570bc79c0df (NOT present in 5.4) extended the
> same GRO_NORMAL queueing to packets received via napi_gro_receive() ->
> napi_skb_finish(). Without 6570bc79c0df, packets received via
> napi_gro_receive() can get delivered ahead of the earlier, queued up packets
> received via napi_gro_frags(). And this is precisely what happens in the ena
> driver with packets smaller than rx_copybreak vs packets larger than
> rx_copybreak.
> 
> Interestingly, the 5.4 stable tree does contain a backport of the upstream
> c80794323e commit, which purports to fix issues introduced by 323ebb61e32b4
> and 6570bc79c0df. But 6570bc79c0df itself is missing...
> 
> I don't yet have a 5.4-stable patch, since I wanted to first raise the issue
> publicly and confirm I'm not missing something obvious. The patch would
> probably involve massaging 6570bc79c0df quite a bit, to avoid colliding with
> the already-present c80794323e fix.

You might want to also cc: the networking mailing list for this type of
stuff :)

thanks,

greg k-h



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux