> -----Original Message----- > From: Praveen Kumar <kumarpraveen@xxxxxxxxxxxxxxxxxxx> > Sent: Monday, June 26, 2023 10:13 AM > To: souradeep chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx>; KY Srinivasan > <kys@xxxxxxxxxxxxx>; Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>; > wei.liu@xxxxxxxxxx; Dexuan Cui <decui@xxxxxxxxxxxxx>; > davem@xxxxxxxxxxxxx; edumazet@xxxxxxxxxx; kuba@xxxxxxxxxx; > pabeni@xxxxxxxxxx; Long Li <longli@xxxxxxxxxxxxx>; Ajay Sharma > <sharmaajay@xxxxxxxxxxxxx>; leon@xxxxxxxxxx; cai.huoqing@xxxxxxxxx; > ssengar@xxxxxxxxxxxxxxxxxxx; vkuznets@xxxxxxxxxx; tglx@xxxxxxxxxxxxx; linux- > hyperv@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > linux-rdma@xxxxxxxxxxxxxxx > Cc: stable@xxxxxxxxxxxxxxx; Souradeep Chakrabarti > <schakrabarti@xxxxxxxxxxxxx> > Subject: Re: [PATCH 2/2 V3 net] net: mana: Fix MANA VF unload when host is > unresponsive > > On 6/26/2023 2:50 PM, souradeep chakrabarti wrote: > > From: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx> > > > > This is the second part of the fix. > > > > Also this patch adds a new attribute in mana_context, which gets set when > > mana_hwc_send_request() hits a timeout because of host unresponsiveness. > > This flag then helps to avoid the timeouts in successive calls. > > > > Fixes: ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f (net: mana: Add a > driver for > > Microsoft Azure Network Adapter) > > Signed-off-by: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx> > > --- > > V2 -> V3: > > * Removed the initialization of vf_unload_timeout > > * Splitted the patch in two. > > * Fixed extra space from the commit message. > > --- > > drivers/net/ethernet/microsoft/mana/gdma_main.c | 4 +++- > > drivers/net/ethernet/microsoft/mana/hw_channel.c | 12 +++++++++++- > > include/net/mana/mana.h | 2 ++ > > 3 files changed, 16 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c > b/drivers/net/ethernet/microsoft/mana/gdma_main.c > > index 8f3f78b68592..6411f01be0d9 100644 > > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c > > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c > > @@ -946,10 +946,12 @@ int mana_gd_deregister_device(struct gdma_dev > *gd) > > struct gdma_context *gc = gd->gdma_context; > > struct gdma_general_resp resp = {}; > > struct gdma_general_req req = {}; > > + struct mana_context *ac; > > int err; > > > > if (gd->pdid == INVALID_PDID) > > return -EINVAL; > > + ac = gd->driver_data; > > > > mana_gd_init_req_hdr(&req.hdr, GDMA_DEREGISTER_DEVICE, > sizeof(req), > > sizeof(resp)); > > @@ -957,7 +959,7 @@ int mana_gd_deregister_device(struct gdma_dev *gd) > > req.hdr.dev_id = gd->dev_id; > > > > err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); > > - if (err || resp.hdr.status) { > > + if ((err || resp.hdr.status) && !ac->vf_unload_timeout) { > > dev_err(gc->dev, "Failed to deregister device: %d, 0x%x\n", > > err, resp.hdr.status); > > With !ac->vf_unload_timeout option, this message may not be correctly > showing err, status. Probably you want to add explicit information during > timeouts so that it give right information ? Or have the err, status field properly > updated. > > > if (!err) > > diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c > b/drivers/net/ethernet/microsoft/mana/hw_channel.c > > index 9d1507eba5b9..492cb2c6e2cb 100644 > > --- a/drivers/net/ethernet/microsoft/mana/hw_channel.c > > +++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c > > @@ -1,8 +1,10 @@ > > // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause > > /* Copyright (c) 2021, Microsoft Corporation. */ > > > > +#include "asm-generic/errno.h" > > #include <net/mana/gdma.h> > > #include <net/mana/hw_channel.h> > > +#include <net/mana/mana.h> > > > > static int mana_hwc_get_msg_index(struct hw_channel_context *hwc, u16 > *msg_id) > > { > > @@ -786,12 +788,19 @@ int mana_hwc_send_request(struct > hw_channel_context *hwc, u32 req_len, > > struct hwc_wq *txq = hwc->txq; > > struct gdma_req_hdr *req_msg; > > struct hwc_caller_ctx *ctx; > > + struct mana_context *ac; > > u32 dest_vrcq = 0; > > u32 dest_vrq = 0; > > u16 msg_id; > > int err; > > > > mana_hwc_get_msg_index(hwc, &msg_id); > > + ac = hwc->gdma_dev->driver_data; > > Is there a case where gdma_dev be invalid here ? If so, lets check the state and > then proceed further ? Yes, hwc->gdma_dev is assigned shortly after it's allocated - see the code below. So it's valid. But hwc->gdma_dev->driver_data is hwc, not "mana_context *ac". There are two gdma_dev in gdma_context: hwc & mana. You can get ac from: hwc->gdma_dev->gdma_context->mana.driver_data Or, to avoid too many pointer deference, I suggest to put the vf_unload_timeout into gdma_context. int mana_hwc_create_channel(struct gdma_context *gc) { hwc = kzalloc(sizeof(*hwc), GFP_KERNEL); ... gd->gdma_context = gc; gd->driver_data = hwc; hwc->gdma_dev = gd; hwc->dev = gc->dev; Also, mana_gd_send_request/mana_hwc_send_request() is used in many places, not just unloading. Should you use timeout value 5 sec, and the vf_unload_timeout flag in unloading path only, and avoid touching other code paths? Please check with hostnet team for suggestions. If we decide to let the vf_unload_timeout flag affect all code paths, not just unloading, then it should be renamed to hwc_timeout, and submit the second patch separately. If just use it for unloading, since mana_gd_deregister_device() is used by PF too, name it like: unload_hwc_timeout. Thanks, -Haiyang