Wen Gong <wgong@xxxxxxxxxxxxxx> writes: > For tcp RX, the quantity of tcp acks to remote is 1/2 of the quantity > of tcp data from remote, then it will have many small length packets > on TX path of sdio bus, then it reduce the RX packets's bandwidth of > tcp. > > This patch enable napi on RX path, then the RX packet of tcp will not > feed to tcp stack immeditely from mac80211 since GRO is enabled by > default, it will feed to tcp stack after napi complete, if rx bundle > is enabled, then it will feed to tcp stack one time for each bundle > of RX. For example, RX bundle size is 32, then tcp stack will receive > one large length packet, its length is neary 1500*32, then tcp stack > will send a tcp ack for this large packet, this will reduce the tcp > acks ratio from 1/2 to 1/32. This results in significant performance > improvement for tcp RX. > > Tcp rx throughout is 240Mbps without this patch, and it arrive 390Mbps > with this patch. The cpu usage has no obvious difference with and > without NAPI. I have not done thorough review yet, but few quick questions: This adds a new skb queue ar->htt.rx_indication_head to RX path, but on one of your earlier patches you also add another skb queue ar_sdio->rx_head. Is it really necessary to have two separate queues in RX path? Sounds like extra complexity to me. The way I have understood that NAPI is used as a mechanism to disable interrupts on the device and gain throughput from that. But in your patch the poll function ath10k_sdio_napi_poll() doesn't touch the hardware at all, it just processes packets from ar->htt.rx_indication_head queue until budget runs out. I'm no NAPI expert so I can't claim it's wrong, but at least it feels odd to me. -- https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches