On Tue, Feb 04, 2025 at 10:10:38PM +0530, Kalesh Anakkur Purayil wrote: > Hi Leon, > > On Tue, Feb 4, 2025 at 5:14 PM Leon Romanovsky <leon@xxxxxxxxxx> wrote: > > > > On Tue, Feb 04, 2025 at 12:21:23AM -0800, Selvin Xavier wrote: > > > From: Kalesh AP <kalesh-anakkur.purayil@xxxxxxxxxxxx> > > > > > > There is a possibility that ulp_irq_stop and ulp_irq_start > > > callbacks will be called when the device is in detached state. > > > This can cause a crash due to NULL pointer dereference as > > > the rdev is already freed. > > > > Description and code doesn't match. If "possibility" exists, there is > > no protection from concurrent detach and ulp_irq_stop. If there is such > > protection, they can't race. > > > > The main idea of auxiliary bus is to remove the need to implement driver > > specific ops. > > There is no race condition here. > > Let me explain the scenario. > > User is doing a devlink reload reinit. As part of that, the Ethernet > driver first invokes the auxiliary bus suspend callback. The RDMA driver > will do the unwinding operation and hence rdev will be freed. > > After that, during the devlink sequence, Ethernet driver invokes the > ulp_irq_stop() callback and this resulted in the NULL pointer > dereference as the RDMA driver is in detached state and the rdev is > already freed. Shouldn't devlink reload completely release all auxiliary drivers? Why are you keeping BNXT RDMA driver during reload? BNXT core driver controls reload, it shouldn't call to drivers which doesn't exist. > > We are trying to address the NULL pointer dereference issue here. You are hiding bugs and not fixing them. > > The driver specific ops, ulp_irq_stop and ulp_irq_start are required. > Broadcom Ethernet and RDMA drivers are designed like that to manage > IRQs between them. > > Hope this clarifies your question. > > > > > > > > Fixes: cc5b9b48d447 ("RDMA/bnxt_re: Recover the device when FW error is detected") > > > Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@xxxxxxxxxxxx> > > > Signed-off-by: Selvin Xavier <selvin.xavier@xxxxxxxxxxxx> > > > --- > > > drivers/infiniband/hw/bnxt_re/main.c | 5 +++++ > > > 1 file changed, 5 insertions(+) > > > > > > diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c > > > index c4c3d67..89ac5c2 100644 > > > --- a/drivers/infiniband/hw/bnxt_re/main.c > > > +++ b/drivers/infiniband/hw/bnxt_re/main.c > > > @@ -438,6 +438,8 @@ static void bnxt_re_stop_irq(void *handle, bool reset) > > > int indx; > > > > > > rdev = en_info->rdev; > > > + if (!rdev) > > > + return; > > > > This can be seen as an example why I'm so negative about assigning NULL > > to the pointers after object is destroyed. Such assignment makes layer > > violation much easier job to do. > > > > Thanks > > > > > rcfw = &rdev->rcfw; > > > > > > if (reset) { > > > @@ -466,6 +468,8 @@ static void bnxt_re_start_irq(void *handle, struct bnxt_msix_entry *ent) > > > int indx, rc; > > > > > > rdev = en_info->rdev; > > > + if (!rdev) > > > + return; > > > msix_ent = rdev->nqr->msix_entries; > > > rcfw = &rdev->rcfw; > > > if (!ent) { > > > @@ -2438,6 +2442,7 @@ static int bnxt_re_suspend(struct auxiliary_device *adev, pm_message_t state) > > > ibdev_info(&rdev->ibdev, "%s: L2 driver notified to stop en_state 0x%lx", > > > __func__, en_dev->en_state); > > > bnxt_re_remove_device(rdev, BNXT_RE_PRE_RECOVERY_REMOVE, adev); > > > + bnxt_re_update_en_info_rdev(NULL, en_info, adev); > > > mutex_unlock(&bnxt_re_mutex); > > > > > > return 0; > > > -- > > > 2.5.5 > > > > > > > -- > Regards, > Kalesh AP