Re: [PATCH rdma-rc 2/4] RDMA/bnxt_re: Add sanity checks on rdev validity

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Leon,

On Tue, Feb 4, 2025 at 5:14 PM Leon Romanovsky <leon@xxxxxxxxxx> wrote:
>
> On Tue, Feb 04, 2025 at 12:21:23AM -0800, Selvin Xavier wrote:
> > From: Kalesh AP <kalesh-anakkur.purayil@xxxxxxxxxxxx>
> >
> > There is a possibility that ulp_irq_stop and ulp_irq_start
> > callbacks will be called when the device is in detached state.
> > This can cause a crash due to NULL pointer dereference as
> > the rdev is already freed.
>
> Description and code doesn't match. If "possibility" exists, there is
> no protection from concurrent detach and ulp_irq_stop. If there is such
> protection, they can't race.
>
> The main idea of auxiliary bus is to remove the need to implement driver
> specific ops.

There is no race condition here.

Let me explain the scenario.

User is doing a devlink reload reinit. As part of that, the Ethernet
driver first invokes the auxiliary bus suspend callback. The RDMA driver
will do the unwinding operation and hence rdev will be freed.

After that, during the devlink sequence, Ethernet driver invokes the
ulp_irq_stop() callback and this resulted in the NULL pointer
dereference as the RDMA driver is in detached state and the rdev is
already freed.

We are trying to address the NULL pointer dereference issue here.

The driver specific ops, ulp_irq_stop and ulp_irq_start are required.
Broadcom Ethernet and RDMA drivers are designed like that to manage
IRQs between them.

Hope this clarifies your question.
>
> >
> > Fixes: cc5b9b48d447 ("RDMA/bnxt_re: Recover the device when FW error is detected")
> > Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@xxxxxxxxxxxx>
> > Signed-off-by: Selvin Xavier <selvin.xavier@xxxxxxxxxxxx>
> > ---
> >  drivers/infiniband/hw/bnxt_re/main.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
> > index c4c3d67..89ac5c2 100644
> > --- a/drivers/infiniband/hw/bnxt_re/main.c
> > +++ b/drivers/infiniband/hw/bnxt_re/main.c
> > @@ -438,6 +438,8 @@ static void bnxt_re_stop_irq(void *handle, bool reset)
> >       int indx;
> >
> >       rdev = en_info->rdev;
> > +     if (!rdev)
> > +             return;
>
> This can be seen as an example why I'm so negative about assigning NULL
> to the pointers after object is destroyed. Such assignment makes layer
> violation much easier job to do.
>
> Thanks
>
> >       rcfw = &rdev->rcfw;
> >
> >       if (reset) {
> > @@ -466,6 +468,8 @@ static void bnxt_re_start_irq(void *handle, struct bnxt_msix_entry *ent)
> >       int indx, rc;
> >
> >       rdev = en_info->rdev;
> > +     if (!rdev)
> > +             return;
> >       msix_ent = rdev->nqr->msix_entries;
> >       rcfw = &rdev->rcfw;
> >       if (!ent) {
> > @@ -2438,6 +2442,7 @@ static int bnxt_re_suspend(struct auxiliary_device *adev, pm_message_t state)
> >       ibdev_info(&rdev->ibdev, "%s: L2 driver notified to stop en_state 0x%lx",
> >                  __func__, en_dev->en_state);
> >       bnxt_re_remove_device(rdev, BNXT_RE_PRE_RECOVERY_REMOVE, adev);
> > +     bnxt_re_update_en_info_rdev(NULL, en_info, adev);
> >       mutex_unlock(&bnxt_re_mutex);
> >
> >       return 0;
> > --
> > 2.5.5
> >



-- 
Regards,
Kalesh AP

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux