Wen Gong <wgong@xxxxxxxxxxxxxx> writes: > For tcp RX, the quantity of tcp acks to remote is 1/2 of the quantity > of tcp data from remote, then it will have many small length packets > on TX path of sdio bus, then it reduce the RX packets's bandwidth of > tcp. > > This patch enable napi on RX path, then the RX packet of tcp will not > feed to tcp stack immeditely from mac80211 since GRO is enabled by > default, it will feed to tcp stack after napi complete, if rx bundle > is enabled, then it will feed to tcp stack one time for each bundle > of RX. For example, RX bundle size is 32, then tcp stack will receive > one large length packet, its length is neary 1500*32, then tcp stack > will send a tcp ack for this large packet, this will reduce the tcp > acks ratio from 1/2 to 1/32. This results in significant performance > improvement for tcp RX. > > Tcp rx throughout is 240Mbps without this patch, and it arrive 390Mbps > with this patch. The cpu usage has no obvious difference with and > without NAPI. > > call stack for each RX packet on GRO path: > (skb length is about 1500 bytes) > skb_gro_receive ([kernel.kallsyms]) > tcp4_gro_receive ([kernel.kallsyms]) > inet_gro_receive ([kernel.kallsyms]) > dev_gro_receive ([kernel.kallsyms]) > napi_gro_receive ([kernel.kallsyms]) > ieee80211_deliver_skb ([mac80211]) > ieee80211_rx_handlers ([mac80211]) > ieee80211_prepare_and_rx_handle ([mac80211]) > ieee80211_rx_napi ([mac80211]) > ath10k_htt_rx_proc_rx_ind_hl ([ath10k_core]) > ath10k_htt_rx_pktlog_completion_handler ([ath10k_core]) > ath10k_sdio_napi_poll ([ath10k_sdio]) > net_rx_action ([kernel.kallsyms]) > softirqentry_text_start ([kernel.kallsyms]) > do_softirq ([kernel.kallsyms]) > > call stack for napi complete and send tcp ack from tcp stack: > (skb length is about 1500*32 bytes) > _tcp_ack_snd_check ([kernel.kallsyms]) > tcp_v4_do_rcv ([kernel.kallsyms]) > tcp_v4_rcv ([kernel.kallsyms]) > local_deliver_finish ([kernel.kallsyms]) > ip_local_deliver ([kernel.kallsyms]) > ip_rcv_finish ([kernel.kallsyms]) > ip_rcv ([kernel.kallsyms]) > netif_receive_skb_core ([kernel.kallsyms]) > netif_receive_skb_one_core([kernel.kallsyms]) > netif_receive_skb ([kernel.kallsyms]) > netif_receive_skb_internal ([kernel.kallsyms]) > napi_gro_complete ([kernel.kallsyms]) > napi_gro_flush ([kernel.kallsyms]) > napi_complete_done ([kernel.kallsyms]) > ath10k_sdio_napi_poll ([ath10k_sdio]) > net_rx_action ([kernel.kallsyms]) > __softirqentry_text_start ([kernel.kallsyms]) > do_softirq ([kernel.kallsyms]) > > Tested with QCA6174 SDIO with firmware > WLAN.RMH.4.4.1-00017-QCARMSWP-1. > > Signed-off-by: Wen Gong <wgong@xxxxxxxxxxxxxx> [...] > +void ath10k_sdio_init_napi(struct ath10k *ar) > +{ > + netif_napi_add(&ar->napi_dev, &ar->napi, ath10k_sdio_napi_poll, > + ATH10K_NAPI_BUDGET); > +} This had a new warning: drivers/net/wireless/ath/ath10k/sdio.c:2053:6: warning: no previous prototype for 'ath10k_sdio_init_napi' [-Wmissing-prototypes] drivers/net/wireless/ath/ath10k/sdio.c:2053:6: warning: symbol 'ath10k_sdio_init_napi' was not declared. Should it be static? But as you are calling this function from only ath10k_sdio_probe(), in the pending branch I removed the function and moved the napi call there. -- https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches