RE: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when host is unresponsive

Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxx> · Mon, 3 Jul 2023 19:55:06 +0000

>-----Original Message-----
>From: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx>
>Sent: Monday, July 3, 2023 10:18 PM
>To: souradeep chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx>
>Cc: KY Srinivasan <kys@xxxxxxxxxxxxx>; Haiyang Zhang
><haiyangz@xxxxxxxxxxxxx>; wei.liu@xxxxxxxxxx; Dexuan Cui
><decui@xxxxxxxxxxxxx>; davem@xxxxxxxxxxxxx; edumazet@xxxxxxxxxx;
>kuba@xxxxxxxxxx; pabeni@xxxxxxxxxx; Long Li <longli@xxxxxxxxxxxxx>; Ajay
>Sharma <sharmaajay@xxxxxxxxxxxxx>; leon@xxxxxxxxxx;
>cai.huoqing@xxxxxxxxx; ssengar@xxxxxxxxxxxxxxxxxxx; vkuznets@xxxxxxxxxx;
>tglx@xxxxxxxxxxxxx; linux-hyperv@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx;
>linux-kernel@xxxxxxxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx;
>stable@xxxxxxxxxxxxxxx; Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxx>
>Subject: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when
>host is unresponsive
>
>From: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx>
>Date: Mon,  3 Jul 2023 01:49:31 -0700
>
>> From: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx>
>
>Please sync your Git name and Git mail account settings, so that your own
>patches won't have "From:" when sending. From what I see, you need to
>correct first letters of name and surname to capital in the Git email settings
>block.
Thank you for pointing, I will fix it.
>
>>
>> When unloading the MANA driver, mana_dealloc_queues() waits for the
>> MANA hardware to complete any inflight packets and set the pending
>> send count to zero. But if the hardware has failed,
>> mana_dealloc_queues() could wait forever.
>>
>> Fix this by adding a timeout to the wait. Set the timeout to 120
>> seconds, which is a somewhat arbitrary value that is more than long
>> enough for functional hardware to complete any sends.
>>
>> Signed-off-by: Souradeep Chakrabarti
>> <schakrabarti@xxxxxxxxxxxxxxxxxxx>
>
>Where's "Fixes:" tagging the blamed commit?
This is present from the day zero of the mana driver code.
It has not been introduced in the code by any commit.
>
>> ---
>> V3 -> V4:
>> * Fixed the commit message to describe the context.
>> * Removed the vf_unload_timeout, as it is not required.
>> ---
>>  drivers/net/ethernet/microsoft/mana/mana_en.c | 26
>> ++++++++++++++++---
>>  1 file changed, 23 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
>> b/drivers/net/ethernet/microsoft/mana/mana_en.c
>> index a499e460594b..d26f1da70411 100644
>> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
>> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
>> @@ -2346,7 +2346,10 @@ static int mana_dealloc_queues(struct
>> net_device *ndev)  {
>>  	struct mana_port_context *apc = netdev_priv(ndev);
>>  	struct gdma_dev *gd = apc->ac->gdma_dev;
>> +	unsigned long timeout;
>>  	struct mana_txq *txq;
>> +	struct sk_buff *skb;
>> +	struct mana_cq *cq;
>>  	int i, err;
>>
>>  	if (apc->port_is_up)
>> @@ -2363,15 +2366,32 @@ static int mana_dealloc_queues(struct
>net_device *ndev)
>>  	 * to false, but it doesn't matter since mana_start_xmit() drops any
>>  	 * new packets due to apc->port_is_up being false.
>>  	 *
>> -	 * Drain all the in-flight TX packets
>> +	 * Drain all the in-flight TX packets.
>> +	 * A timeout of 120 seconds for all the queues is used.
>> +	 * This will break the while loop when h/w is not responding.
>> +	 * This value of 120 has been decided here considering max
>> +	 * number of queues.
>>  	 */
>> +
>> +	timeout = jiffies + 120 * HZ;
>
>Why not initialize it right when declaring?
I will fix it in next version.
>
>>  	for (i = 0; i < apc->num_queues; i++) {
>>  		txq = &apc->tx_qp[i].txq;
>> -
>> -		while (atomic_read(&txq->pending_sends) > 0)
>> +		while (atomic_read(&txq->pending_sends) > 0 &&
>> +		       time_before(jiffies, timeout)) {
>>  			usleep_range(1000, 2000);> +		}
>>  	}
>
>120 seconds by 2 msec step is 60000 iterations, by 1 msec is 120000
>iterations. I know usleep_range() often is much less precise, but still.
>Do you really need that much time? Has this been measured during the tests
>that it can take up to 120 seconds or is it just some random value that "should
>be enough"?
>If you really need 120 seconds, I'd suggest using a timer / delayed work instead
>of wasting resources.
Here the intent is not waiting for 120 seconds, rather than avoid continue checking the 
pending_sends  of each tx queues for an indefinite time, before freeing sk_buffs.
The pending_sends can only get decreased for a tx queue,  if mana_poll_tx_cq()
gets called for a completion notification and increased by xmit.

In this particular bug, apc->port_is_up is not set to false, causing
xmit to keep increasing the pending_sends for the queue and mana_poll_tx_cq()
not getting called for the queue.

If we see the comment in the function mana_dealloc_queues(), it mentions it :

2346     /* No packet can be transmitted now since apc->port_is_up is false.
2347      * There is still a tiny chance that mana_poll_tx_cq() can re-enable
2348      * a txq because it may not timely see apc->port_is_up being cleared
2349      * to false, but it doesn't matter since mana_start_xmit() drops any
2350      * new packets due to apc->port_is_up being false.

The value 120 seconds has been decided here based on maximum number of queues
are allowed in this specific hardware, it is a safe assumption.
>
>>
>> +	for (i = 0; i < apc->num_queues; i++) {
>> +		txq = &apc->tx_qp[i].txq;
>> +		cq = &apc->tx_qp[i].tx_cq;
>
>cq can be just &txq->tx_cq.
mana_txq  structure does not have a pointer to mana_cq.
>
>> +		while (atomic_read(&txq->pending_sends)) {
>> +			skb = skb_dequeue(&txq->pending_skbs);
>> +			mana_unmap_skb(skb, apc);
>> +			napi_consume_skb(skb, cq->budget);
>
>(you already have comment about this one)
>
>> +			atomic_sub(1, &txq->pending_sends);
>> +		}
>> +	}
>>  	/* We're 100% sure the queues can no longer be woken up, because
>>  	 * we're sure now mana_poll_tx_cq() can't be running.
>>  	 */
>
>Thanks,
>Olek