On 25/11/2019 09:09, Nicholas Johnson wrote: > The default value of /proc/sys/net/core/gro_normal_batch was 8. > Setting it to 1 allowed it to connect to Wi-Fi network. > > Setting it back to 8 did not kill the connection. > > But when I disconnected and tried to reconnect, it did not re-connect. > > Hence, it appears that the problem only affects the initial handshake > when associating with a network, and not normal packet flow. That sounds like the GRO batch isn't getting flushed at the endof the NAPI — maybe the driver isn't calling napi_complete_done() at the appropriate time? Indeed, from digging through the layers of iwlwifi I eventually get to iwl_pcie_rx_handle() which doesn't really have a NAPI poll (the napi->poll function is iwl_pcie_dummy_napi_poll() { WARN_ON(1); return 0; }) and instead calls napi_gro_flush() at the end of its RX handling. Unfortunately, napi_gro_flush() is no longer enough, because it doesn't call gro_normal_list() so the packets on the GRO_NORMAL list just sit there indefinitely. It was seeing drivers calling napi_gro_flush() directly that had me worried in the first place about whether listifying napi_gro_receive() was safe and where the gro_normal_list() should go. I wondered if other drivers that show up in [1] needed fixing with a gro_normal_list() next to their napi_gro_flush() call. From a cursory check: brocade/bna: has a real poller, calls napi_complete_done() so is OK. cortina/gemini: calls napi_complete_done() straight after napi_gro_flush(), so is OK. hisilicon/hns3: calls napi_complete(), so is _probably_ OK. But it's far from clear to me why *any* of those drivers are calling napi_gro_flush() themselves... -Ed [1]: https://elixir.bootlin.com/linux/latest/ident/napi_gro_flush