On Thu, Jan 10, 2019 at 01:58:31PM +0100, Frederic Barrat wrote: > > > Le 10/01/2019 à 13:25, Michael Ellerman a écrit : > > Greg Kurz <groug@xxxxxxxx> writes: > > > On Wed, 9 Jan 2019 17:45:53 +0100 > > > Frederic Barrat <fbarrat@xxxxxxxxxxxxx> wrote: > > > > > > > Le 09/01/2019 à 17:25, Greg Kurz a écrit : > > > > > On Wed, 9 Jan 2019 16:13:42 +0100 > > > > > Frederic Barrat <fbarrat@xxxxxxxxxxxxx> wrote: > > > > > > With a recent change around IOMMU group, a system with an opencapi > > > > > > adapter is no longer booting and we get a kernel oops: > > > > > > > > > > > > BUG: Kernel NULL pointer dereference at 0x00000028 > > > > > > Faulting instruction address: 0xc0000000000aa38c > > > > > > Oops: Kernel access of bad area, sig: 7 [#1] > > > > > > LE SMP NR_CPUS=2048 NUMA PowerNV > > > > > > Modules linked in: > > > > > > CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-00001-g3bd6e94bec12 > > > > > > NIP: c0000000000aa38c LR: c0000000000a6608 CTR: c000000000097480 > > > > > > REGS: c000000005783700 TRAP: 0300 Not tainted (5.0.0-rc1-fxb-00001-g3bd6 > > > > > > MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28000228 XER: 20 > > > > > > CFAR: c0000000000a6604 DAR: 0000000000000028 DSISR: 00080000 IRQMASK: 0 > > > > > > GPR00: c0000000000a6608 c000000005783990 c000000001036100 c0000007bf761860 > > > > > > GPR04: 0000000000000000 c000000005783834 0000000000000000 0000000000000000 > > > > > > GPR08: 69626d2c6e707500 0000000000000000 0000000000000000 9000000002001003 > > > > > > GPR12: 0000000000000000 c0000007bfff8300 c000000000010450 0000000000000000 > > > > > > GPR16: c000000000ced938 0000000000000100 c000000000ced948 00000000000a0000 > > > > > > GPR20: 00000000000bfffe c000000000ced9a8 0000000000000200 c000000000ced978 > > > > > > GPR24: 00000000006080c0 c000000716d09828 c00000002e6fd000 0000000000000000 > > > > > > GPR28: c0000007bf4aff68 c0000007bf8d0080 c000000000f23938 c0000007bf761860 > > > > > > NIP [c0000000000aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0 > > > > > > LR [c0000000000a6608] pnv_pci_ioda_fixup+0x1f8/0x660 > > > > > > Call Trace: > > > > > > [c000000005783990] [c0000000000aa3d0] pnv_try_setup_npu_table_group+0x60/0x > > > > > > [c0000000057839d0] [c0000000000a661c] pnv_pci_ioda_fixup+0x20c/0x660 > > > > > > [c000000005783ab0] [c000000000e1d4c0] pcibios_resource_survey+0x2c8/0x31c > > > > > > [c000000005783b90] [c000000000e1caf4] pcibios_init+0xb0/0xe4 > > > > > > [c000000005783c10] [c000000000010054] do_one_initcall+0x64/0x264 > > > > > > [c000000005783ce0] [c000000000e1132c] kernel_init_freeable+0x36c/0x468 > > > > > > [c000000005783db0] [c000000000010474] kernel_init+0x2c/0x148 > > > > > > [c000000005783e20] [c00000000000b794] ret_from_kernel_thread+0x5c/0x68 > > > > > > > > > > > > An opencapi device is using a device PE, so the current code breaks > > > > > > because pe->pbus is not defined. > > > > > > > > > > > > More generally, there's no need to define an IOMMU group for opencapi, > > > > > > as the device sends real addresses directly (admittedly, the > > > > > > virtualization story is yet to be written). So let's fix it by > > > > > > > > > > Current plan is to go for mediated VFIO. The real HW stays under the control > > > > > of the host ocxl driver, and we still don't need an IOMMU group. > > > > > > skipping the IOMMU group setup for opencapi PHBs. > > > > > > > > > > > > Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups") > > > > > > Signed-off-by: Frederic Barrat <fbarrat@xxxxxxxxxxxxx> > > > > > > --- > > > > > > > > > > Reviewed-by: Greg Kurz <groug@xxxxxxxx> > > > > > > > > > > and > > > > > > > > > > Cc: stable@xxxxxxxxxxxxxxx # v4.20 > > > > > > > > Thanks for the review! But why did you add stable? that problem is only > > > > seen on 5.0-rc1, isn't it? > > > > > > Based on the fact that 0bd971676e68 was committed in 4.20... but I haven't > > > tested :) > > > > It was committed to a branch based off 4.20-rc2, but it wasn't merged > > into the 4.20 release. > > > > $ git describe --match "v[0-9]*" --contains 0bd971676e68 > > v5.0-rc1~137^2~15 > > > > So it doesn't need to go to stable. > > Which makes me wonder if Greg (KH) was really talking about that original > patch and whether something worthwhile was dropped from stable by mistake? Totally different thread, sorry for the noise, my fault... greg k-h