> -----Original Message----- > From: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxx> > Sent: Monday, July 3, 2023 3:55 PM > To: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx>; souradeep chakrabarti > <schakrabarti@xxxxxxxxxxxxxxxxxxx> > Cc: KY Srinivasan <kys@xxxxxxxxxxxxx>; Haiyang Zhang > <haiyangz@xxxxxxxxxxxxx>; wei.liu@xxxxxxxxxx; Dexuan Cui > <decui@xxxxxxxxxxxxx>; davem@xxxxxxxxxxxxx; edumazet@xxxxxxxxxx; > kuba@xxxxxxxxxx; pabeni@xxxxxxxxxx; Long Li <longli@xxxxxxxxxxxxx>; Ajay > Sharma <sharmaajay@xxxxxxxxxxxxx>; leon@xxxxxxxxxx; > cai.huoqing@xxxxxxxxx; ssengar@xxxxxxxxxxxxxxxxxxx; vkuznets@xxxxxxxxxx; > tglx@xxxxxxxxxxxxx; linux-hyperv@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; > linux-kernel@xxxxxxxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; > stable@xxxxxxxxxxxxxxx > Subject: RE: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload > when host is unresponsive > > > > >-----Original Message----- > >From: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx> > >Sent: Monday, July 3, 2023 10:18 PM > >To: souradeep chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx> > >Cc: KY Srinivasan <kys@xxxxxxxxxxxxx>; Haiyang Zhang > ><haiyangz@xxxxxxxxxxxxx>; wei.liu@xxxxxxxxxx; Dexuan Cui > ><decui@xxxxxxxxxxxxx>; davem@xxxxxxxxxxxxx; edumazet@xxxxxxxxxx; > >kuba@xxxxxxxxxx; pabeni@xxxxxxxxxx; Long Li <longli@xxxxxxxxxxxxx>; Ajay > >Sharma <sharmaajay@xxxxxxxxxxxxx>; leon@xxxxxxxxxx; > >cai.huoqing@xxxxxxxxx; ssengar@xxxxxxxxxxxxxxxxxxx; vkuznets@xxxxxxxxxx; > >tglx@xxxxxxxxxxxxx; linux-hyperv@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; > >linux-kernel@xxxxxxxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; > >stable@xxxxxxxxxxxxxxx; Souradeep Chakrabarti > <schakrabarti@xxxxxxxxxxxxx> > >Subject: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when > >host is unresponsive > > > >From: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx> > >Date: Mon, 3 Jul 2023 01:49:31 -0700 > > > >> From: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx> > > > >Please sync your Git name and Git mail account settings, so that your own > >patches won't have "From:" when sending. From what I see, you need to > >correct first letters of name and surname to capital in the Git email settings > >block. > Thank you for pointing, I will fix it. > > > >> > >> When unloading the MANA driver, mana_dealloc_queues() waits for the > >> MANA hardware to complete any inflight packets and set the pending > >> send count to zero. But if the hardware has failed, > >> mana_dealloc_queues() could wait forever. > >> > >> Fix this by adding a timeout to the wait. Set the timeout to 120 > >> seconds, which is a somewhat arbitrary value that is more than long > >> enough for functional hardware to complete any sends. > >> > >> Signed-off-by: Souradeep Chakrabarti > >> <schakrabarti@xxxxxxxxxxxxxxxxxxx> > > > >Where's "Fixes:" tagging the blamed commit? > This is present from the day zero of the mana driver code. > It has not been introduced in the code by any commit. > > > >> --- > >> V3 -> V4: > >> * Fixed the commit message to describe the context. > >> * Removed the vf_unload_timeout, as it is not required. > >> --- > >> drivers/net/ethernet/microsoft/mana/mana_en.c | 26 > >> ++++++++++++++++--- > >> 1 file changed, 23 insertions(+), 3 deletions(-) > >> > >> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c > >> b/drivers/net/ethernet/microsoft/mana/mana_en.c > >> index a499e460594b..d26f1da70411 100644 > >> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c > >> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c > >> @@ -2346,7 +2346,10 @@ static int mana_dealloc_queues(struct > >> net_device *ndev) { > >> struct mana_port_context *apc = netdev_priv(ndev); > >> struct gdma_dev *gd = apc->ac->gdma_dev; > >> + unsigned long timeout; > >> struct mana_txq *txq; > >> + struct sk_buff *skb; > >> + struct mana_cq *cq; > >> int i, err; > >> > >> if (apc->port_is_up) > >> @@ -2363,15 +2366,32 @@ static int mana_dealloc_queues(struct > >net_device *ndev) > >> * to false, but it doesn't matter since mana_start_xmit() drops any > >> * new packets due to apc->port_is_up being false. > >> * > >> - * Drain all the in-flight TX packets > >> + * Drain all the in-flight TX packets. > >> + * A timeout of 120 seconds for all the queues is used. > >> + * This will break the while loop when h/w is not responding. > >> + * This value of 120 has been decided here considering max > >> + * number of queues. > >> */ > >> + > >> + timeout = jiffies + 120 * HZ; > > > >Why not initialize it right when declaring? > I will fix it in next version. > > > >> for (i = 0; i < apc->num_queues; i++) { > >> txq = &apc->tx_qp[i].txq; > >> - > >> - while (atomic_read(&txq->pending_sends) > 0) > >> + while (atomic_read(&txq->pending_sends) > 0 && > >> + time_before(jiffies, timeout)) { > >> usleep_range(1000, 2000);> + } > >> } > > > >120 seconds by 2 msec step is 60000 iterations, by 1 msec is 120000 > >iterations. I know usleep_range() often is much less precise, but still. > >Do you really need that much time? Has this been measured during the tests > >that it can take up to 120 seconds or is it just some random value that "should > >be enough"? > >If you really need 120 seconds, I'd suggest using a timer / delayed work > instead > >of wasting resources. > Here the intent is not waiting for 120 seconds, rather than avoid continue > checking the > pending_sends of each tx queues for an indefinite time, before freeing > sk_buffs. > The pending_sends can only get decreased for a tx queue, if mana_poll_tx_cq() > gets called for a completion notification and increased by xmit. > > In this particular bug, apc->port_is_up is not set to false, causing > xmit to keep increasing the pending_sends for the queue and > mana_poll_tx_cq() > not getting called for the queue. > > If we see the comment in the function mana_dealloc_queues(), it mentions it : > > 2346 /* No packet can be transmitted now since apc->port_is_up is false. > 2347 * There is still a tiny chance that mana_poll_tx_cq() can re-enable > 2348 * a txq because it may not timely see apc->port_is_up being cleared > 2349 * to false, but it doesn't matter since mana_start_xmit() drops any > 2350 * new packets due to apc->port_is_up being false. > > The value 120 seconds has been decided here based on maximum number of > queues > are allowed in this specific hardware, it is a safe assumption. I agree. Also, this waiting time is usually much shorter than 120 sec. The long wait only happens in rare and unexpected NIC HW non-responding cases. To further reduce the resource consumption, we can double the usleep_range() time in every iteration. So, the number of iterations will be greatly reduced before reaching 120 sec. Thanks, - Haiyang