>-----Original Message----- >From: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx> >Sent: Monday, July 3, 2023 10:18 PM >To: souradeep chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx> >Cc: KY Srinivasan <kys@xxxxxxxxxxxxx>; Haiyang Zhang ><haiyangz@xxxxxxxxxxxxx>; wei.liu@xxxxxxxxxx; Dexuan Cui ><decui@xxxxxxxxxxxxx>; davem@xxxxxxxxxxxxx; edumazet@xxxxxxxxxx; >kuba@xxxxxxxxxx; pabeni@xxxxxxxxxx; Long Li <longli@xxxxxxxxxxxxx>; Ajay >Sharma <sharmaajay@xxxxxxxxxxxxx>; leon@xxxxxxxxxx; >cai.huoqing@xxxxxxxxx; ssengar@xxxxxxxxxxxxxxxxxxx; vkuznets@xxxxxxxxxx; >tglx@xxxxxxxxxxxxx; linux-hyperv@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; >linux-kernel@xxxxxxxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; >stable@xxxxxxxxxxxxxxx; Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxx> >Subject: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when >host is unresponsive > >From: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx> >Date: Mon, 3 Jul 2023 01:49:31 -0700 > >> From: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx> > >Please sync your Git name and Git mail account settings, so that your own >patches won't have "From:" when sending. From what I see, you need to >correct first letters of name and surname to capital in the Git email settings >block. Thank you for pointing, I will fix it. > >> >> When unloading the MANA driver, mana_dealloc_queues() waits for the >> MANA hardware to complete any inflight packets and set the pending >> send count to zero. But if the hardware has failed, >> mana_dealloc_queues() could wait forever. >> >> Fix this by adding a timeout to the wait. Set the timeout to 120 >> seconds, which is a somewhat arbitrary value that is more than long >> enough for functional hardware to complete any sends. >> >> Signed-off-by: Souradeep Chakrabarti >> <schakrabarti@xxxxxxxxxxxxxxxxxxx> > >Where's "Fixes:" tagging the blamed commit? This is present from the day zero of the mana driver code. It has not been introduced in the code by any commit. > >> --- >> V3 -> V4: >> * Fixed the commit message to describe the context. >> * Removed the vf_unload_timeout, as it is not required. >> --- >> drivers/net/ethernet/microsoft/mana/mana_en.c | 26 >> ++++++++++++++++--- >> 1 file changed, 23 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c >> b/drivers/net/ethernet/microsoft/mana/mana_en.c >> index a499e460594b..d26f1da70411 100644 >> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c >> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c >> @@ -2346,7 +2346,10 @@ static int mana_dealloc_queues(struct >> net_device *ndev) { >> struct mana_port_context *apc = netdev_priv(ndev); >> struct gdma_dev *gd = apc->ac->gdma_dev; >> + unsigned long timeout; >> struct mana_txq *txq; >> + struct sk_buff *skb; >> + struct mana_cq *cq; >> int i, err; >> >> if (apc->port_is_up) >> @@ -2363,15 +2366,32 @@ static int mana_dealloc_queues(struct >net_device *ndev) >> * to false, but it doesn't matter since mana_start_xmit() drops any >> * new packets due to apc->port_is_up being false. >> * >> - * Drain all the in-flight TX packets >> + * Drain all the in-flight TX packets. >> + * A timeout of 120 seconds for all the queues is used. >> + * This will break the while loop when h/w is not responding. >> + * This value of 120 has been decided here considering max >> + * number of queues. >> */ >> + >> + timeout = jiffies + 120 * HZ; > >Why not initialize it right when declaring? I will fix it in next version. > >> for (i = 0; i < apc->num_queues; i++) { >> txq = &apc->tx_qp[i].txq; >> - >> - while (atomic_read(&txq->pending_sends) > 0) >> + while (atomic_read(&txq->pending_sends) > 0 && >> + time_before(jiffies, timeout)) { >> usleep_range(1000, 2000);> + } >> } > >120 seconds by 2 msec step is 60000 iterations, by 1 msec is 120000 >iterations. I know usleep_range() often is much less precise, but still. >Do you really need that much time? Has this been measured during the tests >that it can take up to 120 seconds or is it just some random value that "should >be enough"? >If you really need 120 seconds, I'd suggest using a timer / delayed work instead >of wasting resources. Here the intent is not waiting for 120 seconds, rather than avoid continue checking the pending_sends of each tx queues for an indefinite time, before freeing sk_buffs. The pending_sends can only get decreased for a tx queue, if mana_poll_tx_cq() gets called for a completion notification and increased by xmit. In this particular bug, apc->port_is_up is not set to false, causing xmit to keep increasing the pending_sends for the queue and mana_poll_tx_cq() not getting called for the queue. If we see the comment in the function mana_dealloc_queues(), it mentions it : 2346 /* No packet can be transmitted now since apc->port_is_up is false. 2347 * There is still a tiny chance that mana_poll_tx_cq() can re-enable 2348 * a txq because it may not timely see apc->port_is_up being cleared 2349 * to false, but it doesn't matter since mana_start_xmit() drops any 2350 * new packets due to apc->port_is_up being false. The value 120 seconds has been decided here based on maximum number of queues are allowed in this specific hardware, it is a safe assumption. > >> >> + for (i = 0; i < apc->num_queues; i++) { >> + txq = &apc->tx_qp[i].txq; >> + cq = &apc->tx_qp[i].tx_cq; > >cq can be just &txq->tx_cq. mana_txq structure does not have a pointer to mana_cq. > >> + while (atomic_read(&txq->pending_sends)) { >> + skb = skb_dequeue(&txq->pending_skbs); >> + mana_unmap_skb(skb, apc); >> + napi_consume_skb(skb, cq->budget); > >(you already have comment about this one) > >> + atomic_sub(1, &txq->pending_sends); >> + } >> + } >> /* We're 100% sure the queues can no longer be woken up, because >> * we're sure now mana_poll_tx_cq() can't be running. >> */ > >Thanks, >Olek