Re: [PATCH rdma] RDMA/bnxt_re: cmds completions handler avoid accessing invalid memeory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All, 

We will work with Redhat for final go.

For now this patch is on hold and not urgent.

Leon, 

Hold this discussion for now.


Kashyap

On Fri, 22 Nov 2024, 18:54 Mohammad Heib, <mheib@xxxxxxxxxx> wrote:
On Sat, Nov 16, 2024 at 01:33:13PM +0530, Selvin Xavier wrote:
> On Thu, Nov 14, 2024 at 5:15 PM Leon Romanovsky <leon@xxxxxxxxxx> wrote:
> >
> > On Thu, Nov 14, 2024 at 03:37:30PM +0530, Selvin Xavier wrote:
> > > On Thu, Nov 14, 2024 at 3:34 PM Leon Romanovsky <leon@xxxxxxxxxx> wrote:
> > > >
> > > > On Tue, Nov 12, 2024 at 03:49:56PM +0200, Mohammad Heib wrote:
> > > > > If bnxt FW behaves unexpectedly because of FW bug or unexpected behavior it
> > > > > can send completions for old  cookies that have already been handled by the
> > > > > bnxt driver. If that old cookie was associated with an old calling context
> > > > > the driver will try to access that caller memory again because the driver
> > > > > never clean the is_waiter_alive flag after the caller successfully complete
> > > > > waiting, and this access will cause the following kernel panic:
> > > > >
> > > > > Call Trace:
> > > > >  <IRQ>
> > > > >  ? __die+0x20/0x70
> > > > >  ? page_fault_oops+0x75/0x170
> > > > >  ? exc_page_fault+0xaa/0x140
> > > > >  ? asm_exc_page_fault+0x22/0x30
> > > > >  ? bnxt_qplib_process_qp_event.isra.0+0x20c/0x3a0 [bnxt_re]
> > > > >  ? srso_return_thunk+0x5/0x5f
> > > > >  ? __wake_up_common+0x78/0xa0
> > > > >  ? srso_return_thunk+0x5/0x5f
> > > > >  bnxt_qplib_service_creq+0x18d/0x250 [bnxt_re]
> > > > >  tasklet_action_common+0xac/0x210
> > > > >  handle_softirqs+0xd3/0x2b0
> > > > >  __irq_exit_rcu+0x9b/0xc0
> > > > >  common_interrupt+0x7f/0xa0
> > > > >  </IRQ>
> > > > >  <TASK>
> > > > >
> > > > > To avoid the above unexpected behavior clear the is_waiter_alive flag
> > > > > every time the caller finishes waiting for a completion.
> Mohammad,
>  We were trying to see the possibility. FW shouldn't be giving an old
> cookie. One possibility
> could be if FW crashes and we are in the recovery routine.
> Adding this check is okay, but may be hiding some other error.
> Is it possible to share your test scripts to repro this problem? Also,
> can you share
> the vmcore-demsg also
>
> Thanks
> Selvin
>
I have sent you all the needed data in a separate email.
Thanks,
>
> > > > >
> > > > > Fixes: 691eb7c6110f ("RDMA/bnxt_re: handle command completions after driver detect a timedout")
> > > > > Signed-off-by: Mohammad Heib <mheib@xxxxxxxxxx>
> > > > > ---
> > > > >  drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 16 ++++++++--------
> > > > >  1 file changed, 8 insertions(+), 8 deletions(-)
> > > >
> > > > Selvin?
> > > Someone is confirming the fix. Will ack in a day. Thanks
> >
> > Thanks


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux