Re: Kernel 5.8 and 5.9 fail to boot on C8000

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Helge Deller <deller@xxxxxx>:
> On 10/21/20 5:52 PM, James Bottomley wrote:
> > On Tue, 2020-10-20 at 15:45 +0200, Helge Deller wrote:
> >> Latest Linux kernels v5.8 and v5.9 fail to boot for me on the C8000
> >> machines with this error:
> >>  mptspi: probe of 0000:40:01.0 failed with error -12
> >>  mptbase: ioc1: ERROR - Insufficient memory to add adapter!
> >>  mptspi: probe of 0000:40:01.1 failed with error -12
> >
> > I think you've already figured out that this is an allocation issue.
> > However, it does seem fishy, the code is
> >
> > 	ioc = kzalloc(sizeof(MPT_ADAPTER), GFP_KERNEL);
> > 	if (ioc == NULL) {
> > 		printk(KERN_ERR MYNAM ": ERROR - Insufficient memory to
> > add adapter!\n");
> > 		return -ENOMEM;
> > 	}
> >
> > And MPT_ADAPTER should be just under a page which looks like a very odd
> > allocation to fail so early in boot.  The memory subsystem should have
> > also printed out a trace explaining why it failed the allocation.
>
> I think there are a few issues here.
> First, the allocation issue as seen above is from a current git head,
> where it seems memory allocation is somewhat broken. For now I would ignore it
> until git head stabilizes...
>
> Then, in my machine I have two U320 drives, one "SEAGATE ST373307LW", and one
> "HP 73.4GMAW3073NP". It seems both drives start to fail, because
> even in the firmware when running "search for boot devices", they sometime
> fail to be detected.
>
> The good thing with bad drives is, that with those it's now possible to
> debug error code paths in the drivers. In my case the last syslog
> looks like this (I'm currently testing with Linus plain v5.9 kernel now).
>
> +[ 1126.041880] ioc0: LSI53C1030 B2: Capabilities={Initiator,Target}
> +Begin: Waiting for root file system ...
> +[ 1127.069515] scsi host2: error handler thread failed to spawn, error = -4
> +[ 1127.069515] mptspi: ioc0: WARNING - Unable to register controller with SCSI subsystem
> +<Cpu1> 78000c6201e00000  a0e008c01100b009  CC_PAT_ENCODED_FIELD_WARNING
> +<Cpu1> 76000c6801e00000  0000000000000520  CC_PAT_DATA_FIELD_WARNING
> <XXX: here is something missing - serial port is often not fast enough....>
> +[ 1127.069515] Backtrace:
> +[ 1127.069515]  [<000000001045b7cc>] mptspi_probe+0x248/0x3d0 [mptspi]
> +[ 1127.069515]  [<0000000040946470>] pci_device_probe+0x1ac/0x2d8
> +[ 1127.069515]  [<0000000040add668>] really_probe+0x1bc/0x988
> +[ 1127.069515]  [<0000000040ade704>] driver_probe_device+0x160/0x218
> +[ 1127.069515]  [<0000000040adee24>] device_driver_attach+0x160/0x188
> +[ 1127.069515]  [<0000000040adef90>] __driver_attach+0x144/0x320
> +[ 1127.069515]  [<0000000040ad7c78>] bus_for_each_dev+0xd4/0x158
> +[ 1127.069515]  [<0000000040adc138>] driver_attach+0x4c/0x80
> +[ 1127.069515]  [<0000000040adb3ec>] bus_add_driver+0x3e0/0x498
> +[ 1127.069515]  [<0000000040ae0130>] driver_register+0xf4/0x298
> +[ 1127.069515]  [<00000000409450c4>] __pci_register_driver+0x78/0xa8
> +[ 1127.069515]  [<000000000007d248>] mptspi_init+0x18c/0x1c4 [mptspi]
> +[ 1127.069515]  [<0000000040200f18>] do_one_initcall+0x74/0x314
> +[ 1127.069515]  [<00000000403528c0>] do_init_module+0xb4/0x640
> +[ 1127.069515]  [<0000000040356a24>] load_module+0x3a48/0x493c
> +[ 1127.069515]  [<0000000040357d58>] __do_sys_finit_module+0x120/0x1bc
> +[ 1127.069515]  [<0000000040357e84>] sys_finit_module+0x30/0xa0
> +[ 1127.069515]  [<0000000040210054>] syscall_exit+0x0/0x14
> +[ 1127.069515]
> +[ 1127.069515] Kernel Fault: Code=26 (Data memory access rights trap) at addr 00000000000007d0
> +[ 1127.069515] CPU: 1 PID: 94 Comm: systemd-udevd Tainted: G            E     5.9.0-1-parisc64 #1 Debian 5.9.1-1
> +[ 1127.069515] Hardware name: 9000/785/C8000
> +[ 1127.069515]
> +[ 1127.069515]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
> +[ 1127.069515] PSW: 00001000000011101111111000001111 Tainted: G            E
> +[ 1127.069515] r00-03  000000ff080efe0f 000000413a6a4d60 000000000c1f8be8 000000413a6a4e00
> +[ 1127.069515] r04-07  000000000c1f7000 0000004087ce3000 000000007f41e000 0000000000000000
> +[ 1127.069515] r08-11  0000004087ce3000 000000001045e500 000000001045e6f8 000000004158ea68
> +[ 1127.069515] r12-15  0000000000000002 0000000000000000 000000413a6a44a0 0000000040f92680
> +[ 1127.069515] r16-19  0000000000000cc0 0000000000000002 000000001045eaa0 0000000005c47000
> +[ 1127.069515] r20-23  000000000800000e 000000004c2ce5ae 0000000000000384 0000000000000000
> +[ 1127.069515] r24-27  0000000000000143 000000000800000e 0000000000000000 000000000c1f7000
> +[ 1127.069515] r28-31  00000000000005c8 000000413a6a4e70 000000413a6a4ea0 0000000041430aa0
> +[ 1127.069515] sr00-03  0000000000002800 0000000000000000 0000000000000000 0000000000019000
>
> The string "WARNING - Unable to register controller with SCSI subsystem" is
> from drivers/message/fusion/mptspi.c: mptspi_probe():
>         sh = scsi_host_alloc(&mptspi_driver_template, sizeof(MPT_SCSI_HOST));
>         if (!sh) {
>                 printk(MYIOC_s_WARN_FMT
>                         "Unable to register controller with SCSI subsystem\n",
>                         ioc->name);
>                 error = -1;
>                 goto out_mptspi_probe;
>         }
>
> so, the kernel jumps to:
> out_mptspi_probe:
>         mptscsih_remove(pdev);
>         return error;
>
> Somewhere inside mptscsih_remove() the kernel crashes with a "Data memory access rights trap".
> At first thought I assumed ioc->sh had an invalid value, but debugging showed that it's 0UL.
> Do you have an idea what's going wrong in mptscsih_remove().
> I'd expect the kernel to free all memory, ignore those drives and continue booting (and fail
> later in the boot process because the root drive isn't found then).

Everyone can trigger the fault (on any architecture) by this patch:

diff --git a/drivers/message/fusion/mptspi.c b/drivers/message/fusion/mptspi.c
index eabc4de5816c..1f26ecea4c95 100644
--- a/drivers/message/fusion/mptspi.c
+++ b/drivers/message/fusion/mptspi.c
@@ -1404,6 +1404,7 @@ mptspi_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	}

 	sh = scsi_host_alloc(&mptspi_driver_template, sizeof(MPT_SCSI_HOST));
+	sh = NULL;

 	if (!sh) {
 		printk(MYIOC_s_WARN_FMT


With the patch below the driver now cleanly exits:

[ 1119.508147] Fusion MPT base driver 3.04.20
[ 1119.508147] Copyright (c) 1999-2008 LSI Corporation
[ 1119.508147] Fusion MPT SPI Host driver 3.04.20
[ 1119.508147] mptbase: ioc0: Initiating bringup
[ 1119.508147] sr 1:0:0:0: [sr0] scsi3-mmc drive: 40x/40x cd/rw xa/form2 cdda tray
[ 1119.508147] cdrom: Uniform CD-ROM driver Revision: 3.20
[ 1119.508147] ioc0: LSI53C1030 B2: Capabilities={Initiator,Target}
[ 1121.512619] mptspi: ioc0: WARNING - Unable to register controller with SCSI subsystem
[ 1121.512619] mptspi: probe of 0000:40:01.0 failed with error -1
[ 1121.512619] mptbase: ioc1: Initiating bringup
[ 1122.508645] ioc1: LSI53C1030 B2: Capabilities={Initiator,Target}
[ 1122.508645] mptspi: ioc1: WARNING - Unable to register controller with SCSI subsystem
[ 1123.417139] mptspi: probe of 0000:40:01.1 failed with error -1
[ 1123.487494] Fusion MPT FC Host driver 3.04.20
[ 1123.487494] Fusion MPT SAS Host driver 3.04.20
[ 1123.487494] Fusion MPT misc device (ioctl) driver 3.04.20
[ 1123.487494] mptctl: Registered with Fusion MPT base driver
[ 1123.487494] mptctl: /dev/mptctl @ (major,minor=10,220)


I'll send this patch to the scsi mailing list shortly:


[PATCH] scsi: mptfusion: Fix error paths in mptscsih_remove()

Signed-off-by: Helge Deller <deller@xxxxxx>

diff --git a/drivers/message/fusion/mptscsih.c b/drivers/message/fusion/mptscsih.c
index 8543f0324d5a..0d1b2b0eb843 100644
--- a/drivers/message/fusion/mptscsih.c
+++ b/drivers/message/fusion/mptscsih.c
@@ -1176,8 +1176,10 @@ mptscsih_remove(struct pci_dev *pdev)
 	MPT_SCSI_HOST		*hd;
 	int sz1;

-	if((hd = shost_priv(host)) == NULL)
-		return;
+	if (host == NULL)
+		hd = NULL;
+	else
+		hd = shost_priv(host);

 	mptscsih_shutdown(pdev);

@@ -1193,14 +1195,15 @@ mptscsih_remove(struct pci_dev *pdev)
 	    "Free'd ScsiLookup (%d) memory\n",
 	    ioc->name, sz1));

-	kfree(hd->info_kbuf);
+	if (hd)
+		kfree(hd->info_kbuf);

 	/* NULL the Scsi_Host pointer
 	 */
 	ioc->sh = NULL;

-	scsi_host_put(host);
-
+	if (host)
+		scsi_host_put(host);
 	mpt_detach(pdev);

 }




[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux