Re: [PATCH rdma-rc 2/4] RDMA/bnxt_re: Add sanity checks on rdev validity

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Leon,

On Wed, Feb 5, 2025 at 12:47 PM Leon Romanovsky <leon@xxxxxxxxxx> wrote:
>
> On Tue, Feb 04, 2025 at 10:10:38PM +0530, Kalesh Anakkur Purayil wrote:
> > Hi Leon,
> >
> > On Tue, Feb 4, 2025 at 5:14 PM Leon Romanovsky <leon@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Feb 04, 2025 at 12:21:23AM -0800, Selvin Xavier wrote:
> > > > From: Kalesh AP <kalesh-anakkur.purayil@xxxxxxxxxxxx>
> > > >
> > > > There is a possibility that ulp_irq_stop and ulp_irq_start
> > > > callbacks will be called when the device is in detached state.
> > > > This can cause a crash due to NULL pointer dereference as
> > > > the rdev is already freed.
> > >
> > > Description and code doesn't match. If "possibility" exists, there is
> > > no protection from concurrent detach and ulp_irq_stop. If there is such
> > > protection, they can't race.
> > >
> > > The main idea of auxiliary bus is to remove the need to implement driver
> > > specific ops.
> >
> > There is no race condition here.
> >
> > Let me explain the scenario.
> >
> > User is doing a devlink reload reinit. As part of that, the Ethernet
> > driver first invokes the auxiliary bus suspend callback. The RDMA driver
> > will do the unwinding operation and hence rdev will be freed.
> >
> > After that, during the devlink sequence, Ethernet driver invokes the
> > ulp_irq_stop() callback and this resulted in the NULL pointer
> > dereference as the RDMA driver is in detached state and the rdev is
> > already freed.
>
> Shouldn't devlink reload completely release all auxiliary drivers?
> Why are you keeping BNXT RDMA driver during reload?

That is the current design. BNXT core Ethernet driver will not destroy
the auxiliary device for RDMA, but just calls the suspend callback. As
a result, RDMA driver will remains loaded and remains registered with
the Ethernet driver instance.
> BNXT core driver controls reload, it shouldn't call to drivers which
> doesn't exist.
Since the RDMA driver instance is registered with Ethernet driver,
core Ethernet driver invokes the callback.
>
> >
> > We are trying to address the NULL pointer dereference issue here.
>
> You are hiding bugs and not fixing them.

Yes, but this change is critical for the current design of the driver.
>
> >
> > The driver specific ops, ulp_irq_stop and ulp_irq_start are required.
> > Broadcom Ethernet and RDMA drivers are designed like that to manage
> > IRQs between them.
> >
> > Hope this clarifies your question.
> > >
> > > >
> > > > Fixes: cc5b9b48d447 ("RDMA/bnxt_re: Recover the device when FW error is detected")
> > > > Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@xxxxxxxxxxxx>
> > > > Signed-off-by: Selvin Xavier <selvin.xavier@xxxxxxxxxxxx>
> > > > ---
> > > >  drivers/infiniband/hw/bnxt_re/main.c | 5 +++++
> > > >  1 file changed, 5 insertions(+)
> > > >
> > > > diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
> > > > index c4c3d67..89ac5c2 100644
> > > > --- a/drivers/infiniband/hw/bnxt_re/main.c
> > > > +++ b/drivers/infiniband/hw/bnxt_re/main.c
> > > > @@ -438,6 +438,8 @@ static void bnxt_re_stop_irq(void *handle, bool reset)
> > > >       int indx;
> > > >
> > > >       rdev = en_info->rdev;
> > > > +     if (!rdev)
> > > > +             return;
> > >
> > > This can be seen as an example why I'm so negative about assigning NULL
> > > to the pointers after object is destroyed. Such assignment makes layer
> > > violation much easier job to do.
> > >
> > > Thanks
> > >
> > > >       rcfw = &rdev->rcfw;
> > > >
> > > >       if (reset) {
> > > > @@ -466,6 +468,8 @@ static void bnxt_re_start_irq(void *handle, struct bnxt_msix_entry *ent)
> > > >       int indx, rc;
> > > >
> > > >       rdev = en_info->rdev;
> > > > +     if (!rdev)
> > > > +             return;
> > > >       msix_ent = rdev->nqr->msix_entries;
> > > >       rcfw = &rdev->rcfw;
> > > >       if (!ent) {
> > > > @@ -2438,6 +2442,7 @@ static int bnxt_re_suspend(struct auxiliary_device *adev, pm_message_t state)
> > > >       ibdev_info(&rdev->ibdev, "%s: L2 driver notified to stop en_state 0x%lx",
> > > >                  __func__, en_dev->en_state);
> > > >       bnxt_re_remove_device(rdev, BNXT_RE_PRE_RECOVERY_REMOVE, adev);
> > > > +     bnxt_re_update_en_info_rdev(NULL, en_info, adev);
> > > >       mutex_unlock(&bnxt_re_mutex);
> > > >
> > > >       return 0;
> > > > --
> > > > 2.5.5
> > > >
> >
> >
> >
> > --
> > Regards,
> > Kalesh AP
>
>


-- 
Regards,
Kalesh AP

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux