Re: [PATCH rdma-rc] IB/mlx5: Do not remove memory DMA mapping while HCA still holds DMA address

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 16, 2018 at 01:41:34PM -0400, Doug Ledford wrote:
> On Thu, 2018-10-11 at 08:20 +0300, Leon Romanovsky wrote:
> > On Wed, Oct 10, 2018 at 02:55:58PM -0400, Doug Ledford wrote:
> > > On Wed, 2018-10-10 at 09:56 +0300, Leon Romanovsky wrote:
> > > > From: Valentine Fatiev <Valentinef@xxxxxxxxxxxx>
> > > >
> > > > The function that puts back the MR in cache also removes the DMA address
> > > > from the HCA. Therefore we need to call this function before we remove
> > > > the DMA mapping from MMU. Otherwise the HCA may access a memory that
> > > > is no longer DMA mapped.
> > > >
> > > > Call trace:
> > > > NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0.
> > > > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0-rc6+ #4
> > > > Hardware name: HP ProLiant DL360p Gen8, BIOS P71 08/20/2012
> > > > RIP: 0010:intel_idle+0x73/0x120
> > > > Code: 80 5c 01 00 0f ae 38 0f ae f0 31 d2 65 48 8b 04 25 80 5c 01 00 48 89 d1 0f 60 02
> > > > RSP: 0018:ffffffff9a403e38 EFLAGS: 00000046
> > > > RAX: 0000000000000030 RBX: 0000000000000005 RCX: 0000000000000001
> > > > RDX: 0000000000000000 RSI: ffffffff9a5790c0 RDI: 0000000000000000
> > > > RBP: 0000000000000030 R08: 0000000000000000 R09: 0000000000007cf9
> > > > R10: 000000000000030a R11: 0000000000000018 R12: 0000000000000000
> > > > R13: ffffffff9a5792b8 R14: ffffffff9a5790c0 R15: 0000002b48471e4d
> > > > FS:  0000000000000000(0000) GS:ffff9c6caf400000(0000) knlGS:0000000000000000
> > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > CR2: 00007f5737185000 CR3: 0000000590c0a002 CR4: 00000000000606f0
> > > > Call Trace:
> > > >  cpuidle_enter_state+0x7e/0x2e0
> > > >  do_idle+0x1ed/0x290
> > > >  cpu_startup_entry+0x6f/0x80
> > > >  start_kernel+0x524/0x544
> > > >  ? set_init_arg+0x55/0x55
> > > >  secondary_startup_64+0xa4/0xb0
> > > > DMAR: DRHD: handling fault status reg 2
> > > > DMAR: [DMA Read] Request device [04:00.0] fault addr b34d2000 [fault reason 06] PTE Read access is not set
> > > > DMAR: [DMA Read] Request device [01:00.2] fault addr bff8b000 [fault reason 06] PTE Read access is not set
> > > >
> > > > Fixes: f3f134f5260a ("RDMA/mlx5: Fix crash while accessing garbage pointer and freed memory")
> > > > Signed-off-by: Valentine Fatiev <valentinef@xxxxxxxxxxxx>
> > > > Reviewed-by: Moni Shoua <monis@xxxxxxxxxxxx>
> > > > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx>
> > >
> > > With a fixup to the Subject of the patch to keep within git guidelines,
> > > patch applied to for-rc.
> >
> > Thanks Doug,
> >
> > Can I ask for a favor? Majd pointed to me that the commit 3f134f5260a
> > ("RDMA/mlx5: Fix crash while accessing garbage pointer and freed memory")
> > was accepted a couple of releases before, so the problematic commit
> > exists in stables.
> >
> > Can you please add the following line to commit message?
> > "Cc: <stable@xxxxxxxxxxxxxxx> # 4.16"
>
> Sorry Leon, I missed this before I sent the pull request to gkh (I blame
> spending so much time (not always) on mute during concalls last week.
> We'll have to separate send something to stable for this (I had thought
> it was just in this release cycle that the original bug crept in, but I
> admit I didn't check the tag --contains value, I just figured a crash
> issue that bad wouldn't sneak past us for several releases...my bad).

No problem, I'll send request for stable inclusion later on.
Regarding the crash, we found it in specific scenario while we run qperf
with very big messages > 100000.

Thanks

>
> --
> Doug Ledford <dledford@xxxxxxxxxx>
>     GPG KeyID: B826A3330E572FDD
>     Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux