RE: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when host is unresponsive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxx>
> Sent: Monday, July 3, 2023 3:55 PM
> To: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx>; souradeep chakrabarti
> <schakrabarti@xxxxxxxxxxxxxxxxxxx>
> Cc: KY Srinivasan <kys@xxxxxxxxxxxxx>; Haiyang Zhang
> <haiyangz@xxxxxxxxxxxxx>; wei.liu@xxxxxxxxxx; Dexuan Cui
> <decui@xxxxxxxxxxxxx>; davem@xxxxxxxxxxxxx; edumazet@xxxxxxxxxx;
> kuba@xxxxxxxxxx; pabeni@xxxxxxxxxx; Long Li <longli@xxxxxxxxxxxxx>; Ajay
> Sharma <sharmaajay@xxxxxxxxxxxxx>; leon@xxxxxxxxxx;
> cai.huoqing@xxxxxxxxx; ssengar@xxxxxxxxxxxxxxxxxxx; vkuznets@xxxxxxxxxx;
> tglx@xxxxxxxxxxxxx; linux-hyperv@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx;
> linux-kernel@xxxxxxxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx;
> stable@xxxxxxxxxxxxxxx
> Subject: RE: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload
> when host is unresponsive
> 
> 
> 
> >-----Original Message-----
> >From: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx>
> >Sent: Monday, July 3, 2023 10:18 PM
> >To: souradeep chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx>
> >Cc: KY Srinivasan <kys@xxxxxxxxxxxxx>; Haiyang Zhang
> ><haiyangz@xxxxxxxxxxxxx>; wei.liu@xxxxxxxxxx; Dexuan Cui
> ><decui@xxxxxxxxxxxxx>; davem@xxxxxxxxxxxxx; edumazet@xxxxxxxxxx;
> >kuba@xxxxxxxxxx; pabeni@xxxxxxxxxx; Long Li <longli@xxxxxxxxxxxxx>; Ajay
> >Sharma <sharmaajay@xxxxxxxxxxxxx>; leon@xxxxxxxxxx;
> >cai.huoqing@xxxxxxxxx; ssengar@xxxxxxxxxxxxxxxxxxx; vkuznets@xxxxxxxxxx;
> >tglx@xxxxxxxxxxxxx; linux-hyperv@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx;
> >linux-kernel@xxxxxxxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx;
> >stable@xxxxxxxxxxxxxxx; Souradeep Chakrabarti
> <schakrabarti@xxxxxxxxxxxxx>
> >Subject: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when
> >host is unresponsive
> >
> >From: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx>
> >Date: Mon,  3 Jul 2023 01:49:31 -0700
> >
> >> From: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx>
> >
> >Please sync your Git name and Git mail account settings, so that your own
> >patches won't have "From:" when sending. From what I see, you need to
> >correct first letters of name and surname to capital in the Git email settings
> >block.
> Thank you for pointing, I will fix it.
> >
> >>
> >> When unloading the MANA driver, mana_dealloc_queues() waits for the
> >> MANA hardware to complete any inflight packets and set the pending
> >> send count to zero. But if the hardware has failed,
> >> mana_dealloc_queues() could wait forever.
> >>
> >> Fix this by adding a timeout to the wait. Set the timeout to 120
> >> seconds, which is a somewhat arbitrary value that is more than long
> >> enough for functional hardware to complete any sends.
> >>
> >> Signed-off-by: Souradeep Chakrabarti
> >> <schakrabarti@xxxxxxxxxxxxxxxxxxx>
> >
> >Where's "Fixes:" tagging the blamed commit?
> This is present from the day zero of the mana driver code.
> It has not been introduced in the code by any commit.
> >
> >> ---
> >> V3 -> V4:
> >> * Fixed the commit message to describe the context.
> >> * Removed the vf_unload_timeout, as it is not required.
> >> ---
> >>  drivers/net/ethernet/microsoft/mana/mana_en.c | 26
> >> ++++++++++++++++---
> >>  1 file changed, 23 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
> >> b/drivers/net/ethernet/microsoft/mana/mana_en.c
> >> index a499e460594b..d26f1da70411 100644
> >> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> >> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> >> @@ -2346,7 +2346,10 @@ static int mana_dealloc_queues(struct
> >> net_device *ndev)  {
> >>  	struct mana_port_context *apc = netdev_priv(ndev);
> >>  	struct gdma_dev *gd = apc->ac->gdma_dev;
> >> +	unsigned long timeout;
> >>  	struct mana_txq *txq;
> >> +	struct sk_buff *skb;
> >> +	struct mana_cq *cq;
> >>  	int i, err;
> >>
> >>  	if (apc->port_is_up)
> >> @@ -2363,15 +2366,32 @@ static int mana_dealloc_queues(struct
> >net_device *ndev)
> >>  	 * to false, but it doesn't matter since mana_start_xmit() drops any
> >>  	 * new packets due to apc->port_is_up being false.
> >>  	 *
> >> -	 * Drain all the in-flight TX packets
> >> +	 * Drain all the in-flight TX packets.
> >> +	 * A timeout of 120 seconds for all the queues is used.
> >> +	 * This will break the while loop when h/w is not responding.
> >> +	 * This value of 120 has been decided here considering max
> >> +	 * number of queues.
> >>  	 */
> >> +
> >> +	timeout = jiffies + 120 * HZ;
> >
> >Why not initialize it right when declaring?
> I will fix it in next version.
> >
> >>  	for (i = 0; i < apc->num_queues; i++) {
> >>  		txq = &apc->tx_qp[i].txq;
> >> -
> >> -		while (atomic_read(&txq->pending_sends) > 0)
> >> +		while (atomic_read(&txq->pending_sends) > 0 &&
> >> +		       time_before(jiffies, timeout)) {
> >>  			usleep_range(1000, 2000);> +		}
> >>  	}
> >
> >120 seconds by 2 msec step is 60000 iterations, by 1 msec is 120000
> >iterations. I know usleep_range() often is much less precise, but still.
> >Do you really need that much time? Has this been measured during the tests
> >that it can take up to 120 seconds or is it just some random value that "should
> >be enough"?
> >If you really need 120 seconds, I'd suggest using a timer / delayed work
> instead
> >of wasting resources.
> Here the intent is not waiting for 120 seconds, rather than avoid continue
> checking the
> pending_sends  of each tx queues for an indefinite time, before freeing
> sk_buffs.
> The pending_sends can only get decreased for a tx queue,  if mana_poll_tx_cq()
> gets called for a completion notification and increased by xmit.
> 
> In this particular bug, apc->port_is_up is not set to false, causing
> xmit to keep increasing the pending_sends for the queue and
> mana_poll_tx_cq()
> not getting called for the queue.
> 
> If we see the comment in the function mana_dealloc_queues(), it mentions it :
> 
> 2346     /* No packet can be transmitted now since apc->port_is_up is false.
> 2347      * There is still a tiny chance that mana_poll_tx_cq() can re-enable
> 2348      * a txq because it may not timely see apc->port_is_up being cleared
> 2349      * to false, but it doesn't matter since mana_start_xmit() drops any
> 2350      * new packets due to apc->port_is_up being false.
> 
> The value 120 seconds has been decided here based on maximum number of
> queues
> are allowed in this specific hardware, it is a safe assumption.

I agree. Also, this waiting time is usually much shorter than 120 sec. The long 
wait only happens in rare and unexpected NIC HW non-responding cases. To 
further reduce the resource consumption, we can double the usleep_range() 
time in every iteration. So, the number of iterations will be greatly reduced 
before reaching 120 sec.

Thanks,
- Haiyang




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux