From: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxx> Date: Thu, 6 Jul 2023 11:43:58 +0000 > > >> -----Original Message----- >> From: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx> >> Sent: Thursday, July 6, 2023 5:09 PM >> To: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxx>; souradeep >> chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx> >> Cc: KY Srinivasan <kys@xxxxxxxxxxxxx>; Haiyang Zhang >> <haiyangz@xxxxxxxxxxxxx>; wei.liu@xxxxxxxxxx; Dexuan Cui >> <decui@xxxxxxxxxxxxx>; davem@xxxxxxxxxxxxx; edumazet@xxxxxxxxxx; >> kuba@xxxxxxxxxx; pabeni@xxxxxxxxxx; Long Li <longli@xxxxxxxxxxxxx>; Ajay >> Sharma <sharmaajay@xxxxxxxxxxxxx>; leon@xxxxxxxxxx; >> cai.huoqing@xxxxxxxxx; ssengar@xxxxxxxxxxxxxxxxxxx; vkuznets@xxxxxxxxxx; >> tglx@xxxxxxxxxxxxx; linux-hyperv@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; >> linux-kernel@xxxxxxxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; >> stable@xxxxxxxxxxxxxxx >> Subject: Re: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload >> when host is unresponsive >> >> From: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxx> >> Date: Thu, 6 Jul 2023 10:41:03 +0000 >> >>> >>> >>>> -----Original Message----- >>>> From: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx> >>>> Sent: Wednesday, July 5, 2023 8:06 PM >> >> [...] >> >>>>>> 120 seconds by 2 msec step is 60000 iterations, by 1 msec is 120000 >>>>>> iterations. I know usleep_range() often is much less precise, but still. >>>>>> Do you really need that much time? Has this been measured during >>>>>> the tests that it can take up to 120 seconds or is it just some >>>>>> random value that "should be enough"? >>>>>> If you really need 120 seconds, I'd suggest using a timer / delayed >>>>>> work instead of wasting resources. >>>>> Here the intent is not waiting for 120 seconds, rather than avoid >>>>> continue checking the pending_sends of each tx queues for an >>>>> indefinite time, >>>> before freeing sk_buffs. >>>>> The pending_sends can only get decreased for a tx queue, if >>>>> mana_poll_tx_cq() gets called for a completion notification and >>>>> increased by >>>> xmit. >>>>> >>>>> In this particular bug, apc->port_is_up is not set to false, causing >>>>> xmit to keep increasing the pending_sends for the queue and >>>>> mana_poll_tx_cq() not getting called for the queue. >>>>> >>>>> If we see the comment in the function mana_dealloc_queues(), it mentions >> it : >>>>> >>>>> 2346 /* No packet can be transmitted now since apc->port_is_up is false. >>>>> 2347 * There is still a tiny chance that mana_poll_tx_cq() can re-enable >>>>> 2348 * a txq because it may not timely see apc->port_is_up being cleared >>>>> 2349 * to false, but it doesn't matter since mana_start_xmit() drops any >>>>> 2350 * new packets due to apc->port_is_up being false. >>>>> >>>>> The value 120 seconds has been decided here based on maximum number >>>>> of queues >>>> >>>> This is quite opposite to what you're saying above. How should I >>>> connect these >>>> two: >>>> >>>> Here the intent is not waiting for 120 seconds >>>> >>>> + >>>> >>>> The value 120 seconds has been decided here based on maximum number >>>> of queues >>>> >>>> ? >>>> Can cleaning the Tx queues really last for 120 seconds? >>>> My understanding is that timeouts need to be sensible and not go to >>>> the outer space. What is the medium value you got during the tests? >>>> >>> For each queue each takes few milli second, in a normal condition. So >>> based on maximum number of allowed queues for our h/w it won't go >>> beyond a sec. >>> The 120s only happens rarely during some NIC HW issue -unexpected. >>> So this timeout will only trigger in a very rare scenario. >> >> So set the timeout to 2 seconds if it makes no difference? > It can go near 120 seconds in a very rare MANA h/w scenario. That normally won't happen. > But during that scenario, we may need 120 seconds. This waiting loop is needed to let the pending Tx packets be sent. If they weren't sent in 1 second, it most likely makes no sense already whether they will be sent at all or not -- the destination host won't wait for them for so long. You say that it may happen only in case of HW issue. If so, I assume you need to fix it some way, e.g. do a HW reset or so? If so, why bother waiting for Tx completions if Tx is hung? You free all skbs later either way, so there are no leaks. >> >>>>> are allowed in this specific hardware, it is a safe assumption. >>>>>> >>>>>>> >>>>>>> + for (i = 0; i < apc->num_queues; i++) { >>>>>>> + txq = &apc->tx_qp[i].txq; >>>>>>> + cq = &apc->tx_qp[i].tx_cq; >> [...] >> >> Thanks, >> Olek Thanks, Olek