[mptscsi] endless Domain Validation loop with external RAID enclosure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello linux scsi gurus,

 I am having trouble getting an external Brownie BR8600 SCSItoIDE RAID
enclosure to work on our new HP Proliant ML350 file server. The SCSI
controller used is the integrated HP LSI based MPT SCSI controller :

0000:06:01.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) 0000:09:01.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07)

With linux kernel 2.6.18.1 (official kernel.org, no added patches), the mptspi module goes into endless domain validation retries. I can get the RAID enclosure to work using the stock 2.6.8-2 kernel from Debian sarge (an SCSI protocol error is reported after loading the mptscsih module
but the device works OK afterwards). Hovewer, this now quite old kernel
release lacks the bnx2 driver modules that is needed for the additional
Broadcom Nextreme II based network cards.

I understand that the problem is probably caused by buggy firmware in
the BR8600 RAID system, but we have 5 of those in production each with
1.75 Tb of online storage that used to work fine on servers running
2.4.x kernels. Replacing these RAID systems is not really an affordable option at the moment.

So, is there a way to tell the kernel scsi subsystem or the mptspi module not to bother about this domain validation failure ? Besides, souldn't
this kind of endless loop be considered as a kernel bug, even if triggered
by buggy peripheral firmware (afterwards, it works on kernel 2.4.x
or 2.6.8-2) ?

Here is an extract from the kernel logs on the 2.6.18.1 kernel :

Nov 13 17:06:32 filer1 kernel: Fusion MPT base driver 3.04.01
Nov 13 17:06:32 filer1 kernel: Copyright (c) 1999-2005 LSI Logic Corporation
Nov 13 17:06:32 filer1 kernel: Fusion MPT SPI Host driver 3.04.01
Nov 13 17:06:32 filer1 kernel: ACPI: PCI Interrupt 0000:06:01.0[A] -> GSI 48 (level, low) -> IRQ 209
Nov 13 17:06:32 filer1 kernel: mptbase: Initiating ioc0 bringup
Nov 13 17:06:32 filer1 kernel: ioc0: 53C1030: Capabilities={Initiator,Target} Nov 13 17:06:32 filer1 kernel: scsi2 : ioc0: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=209 Nov 13 17:06:37 filer1 kernel: ACPI: PCI Interrupt 0000:09:01.0[A] -> GSI 76 (level, low) -> IRQ 217
Nov 13 17:06:37 filer1 kernel: mptbase: Initiating ioc1 bringup
Nov 13 17:06:37 filer1 kernel: ioc1: 53C1030: Capabilities={Initiator,Target} Nov 13 17:06:38 filer1 kernel: scsi3 : ioc1: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=217 Nov 13 17:06:42 filer1 kernel: ACPI: PCI Interrupt 0000:02:03.0[A] -> GSI 24 (level, low) -> IRQ 225
Nov 13 17:06:42 filer1 kernel: mptbase: Initiating ioc2 bringup
Nov 13 17:06:43 filer1 kernel: ioc2: 53C1030: Capabilities={Initiator,Target} Nov 13 17:06:43 filer1 kernel: scsi4 : ioc2: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=225 Nov 13 17:06:48 filer1 kernel: ACPI: PCI Interrupt 0000:02:03.1[B] -> GSI 25 (level, low) -> IRQ 233
Nov 13 17:06:48 filer1 kernel: mptbase: Initiating ioc3 bringup
Nov 13 17:06:49 filer1 kernel: ioc3: 53C1030: Capabilities={Initiator,Target} Nov 13 17:06:49 filer1 kernel: scsi5 : ioc3: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=233 Nov 13 17:06:52 filer1 kernel: Vendor: BROWNIE Model: 8600U3 Rev: 0001 Nov 13 17:06:52 filer1 kernel: Type: Direct-Access ANSI SCSI revision: 03
Nov 13 17:06:52 filer1 kernel:  target5:0:8: Beginning Domain Validation
Nov 13 17:07:02 filer1 kernel: mptscsih: ioc3: attempting task abort! (sc=f7a21980)
Nov 13 17:07:02 filer1 kernel: scsi 5:0:8:0:
Nov 13 17:07:02 filer1 kernel:         command: Inquiry: 12 00 00 00 90 00
Nov 13 17:07:04 filer1 kernel: mptbase: Initiating ioc3 recovery
Nov 13 17:07:10 filer1 kernel: mptscsih: ioc3: task abort: SUCCESS (sc=f7a21980)Nov 13 17:07:10 filer1 kernel: target5:0:8: Domain Validation detected failure, dropping back Nov 13 17:07:10 filer1 kernel: target5:0:8: Domain Validation skipping write tests
Nov 13 17:07:10 filer1 kernel:  target5:0:8: Ending Domain Validation
Nov 13 17:07:10 filer1 kernel:  target5:0:8: asynchronous
Nov 13 17:07:10 filer1 kernel: SCSI device sda: 3417931776 512-byte hdwr sectors (1749981 MB)
Nov 13 17:07:10 filer1 kernel: sda: Write Protect is off
Nov 13 17:07:10 filer1 kernel: sda: Mode Sense: ad 00 00 08
Nov 13 17:07:10 filer1 kernel: SCSI device sda: drive cache: write back
Nov 13 17:07:10 filer1 kernel: SCSI device sda: 3417931776 512-byte hdwr sectors (1749981 MB)
Nov 13 17:07:10 filer1 kernel: sda: Write Protect is off
Nov 13 17:07:10 filer1 kernel: sda: Mode Sense: ad 00 00 08
Nov 13 17:07:10 filer1 kernel: SCSI device sda: drive cache: write back
Nov 13 17:07:10 filer1 kernel:  sda: sda1
Nov 13 17:07:10 filer1 kernel: sd 5:0:8:0: Attached scsi disk sda
Nov 13 17:07:10 filer1 kernel:  target5:0:8: Beginning Domain Validation
Nov 13 17:07:22 filer1 kernel: mptscsih: ioc3: attempting task abort! (sc=f77a5e00)
Nov 13 17:07:22 filer1 kernel: sd 5:0:8:0:
Nov 13 17:07:22 filer1 kernel:         command: Inquiry: 12 00 00 00 90 00
Nov 13 17:07:24 filer1 kernel: mptbase: Initiating ioc3 recovery
Nov 13 17:07:30 filer1 kernel: mptscsih: ioc3: task abort: SUCCESS (sc=f77a5e00)Nov 13 17:07:30 filer1 kernel: target5:0:8: Domain Validation detected failure, dropping back Nov 13 17:07:30 filer1 kernel: target5:0:8: Domain Validation skipping write tests
Nov 13 17:07:30 filer1 kernel:  target5:0:8: Ending Domain Validation
Nov 13 17:07:30 filer1 kernel:  target5:0:8: asynchronous
Nov 13 17:07:30 filer1 kernel:  target5:0:8: Beginning Domain Validation
Nov 13 17:07:40 filer1 kernel: mptscsih: ioc3: attempting task abort! (sc=f7a3e980)
Nov 13 17:07:40 filer1 kernel: sd 5:0:8:0:
Nov 13 17:07:40 filer1 kernel:         command: Inquiry: 12 00 00 00 90 00
Nov 13 17:07:42 filer1 kernel: mptbase: Initiating ioc3 recovery
Nov 13 17:07:48 filer1 kernel: mptscsih: ioc3: task abort: SUCCESS (sc=f7a3e980)Nov 13 17:07:48 filer1 kernel: target5:0:8: Domain Validation detected failure, dropping back Nov 13 17:07:48 filer1 kernel: target5:0:8: Domain Validation skipping write tests
Nov 13 17:07:48 filer1 kernel:  target5:0:8: Ending Domain Validation
Nov 13 17:07:48 filer1 kernel:  target5:0:8: asynchronous
Nov 13 17:07:48 filer1 kernel:  target5:0:8: Beginning Domain Validation
Nov 13 17:07:58 filer1 kernel: mptscsih: ioc3: attempting task abort! (sc=f7a21c80)
Nov 13 17:07:58 filer1 kernel: sd 5:0:8:0:
Nov 13 17:07:58 filer1 kernel:         command: Inquiry: 12 00 00 00 90 00
Nov 13 17:08:00 filer1 kernel: mptbase: Initiating ioc3 recovery
Nov 13 17:08:06 filer1 kernel: mptscsih: ioc3: task abort: SUCCESS (sc=f7a21c80)Nov 13 17:08:06 filer1 kernel: target5:0:8: Domain Validation detected failure, dropping back Nov 13 17:08:06 filer1 kernel: target5:0:8: Domain Validation skipping write tests
Nov 13 17:08:06 filer1 kernel:  target5:0:8: Ending Domain Validation
Nov 13 17:08:06 filer1 kernel:  target5:0:8: asynchronous
Nov 13 17:08:06 filer1 kernel:  target5:0:8: Beginning Domain Validation
Nov 13 17:08:16 filer1 kernel: mptscsih: ioc3: attempting task abort! (sc=f7a3e200)
Nov 13 17:08:16 filer1 kernel: sd 5:0:8:0:
Nov 13 17:08:16 filer1 kernel:         command: Inquiry: 12 00 00 00 90 00
Nov 13 17:08:19 filer1 kernel: mptbase: Initiating ioc3 recovery
[then it goes on an on....]

the same kernel logs with 2.6.8-2 on the same machine :

Nov 14 09:51:14 filer1 kernel: Fusion MPT base driver 3.01.09
Nov 14 09:51:14 filer1 kernel: Copyright (c) 1999-2004 LSI Logic Corporation Nov 14 09:51:14 filer1 kernel: ACPI: PCI interrupt 0000:06:01.0[A] -> GSI 48 (level, low) -> IRQ 201
Nov 14 09:51:14 filer1 kernel: mptbase: Initiating ioc0 bringup
Nov 14 09:51:14 filer1 kernel: ioc0: 53C1030: Capabilities={Initiator,Target} Nov 14 09:51:14 filer1 kernel: ACPI: PCI interrupt 0000:09:01.0[A] -> GSI 76 (level, low) -> IRQ 209
Nov 14 09:51:14 filer1 kernel: mptbase: Initiating ioc1 bringup
Nov 14 09:51:14 filer1 kernel: ioc1: 53C1030: Capabilities={Initiator,Target} Nov 14 09:51:14 filer1 kernel: ACPI: PCI interrupt 0000:02:03.0[A] -> GSI 24 (level, low) -> IRQ 50
Nov 14 09:51:14 filer1 kernel: mptbase: Initiating ioc2 bringup
Nov 14 09:51:14 filer1 kernel: ioc2: 53C1030: Capabilities={Initiator,Target} Nov 14 09:51:14 filer1 kernel: ACPI: PCI interrupt 0000:02:03.1[B] -> GSI 25 (level, low) -> IRQ 58
Nov 14 09:51:14 filer1 kernel: mptbase: Initiating ioc3 bringup
Nov 14 09:51:14 filer1 kernel: ioc3: 53C1030: Capabilities={Initiator,Target}
Nov 14 09:51:14 filer1 kernel: Fusion MPT SCSI Host driver 3.01.09
Nov 14 09:51:14 filer1 kernel: scsi2 : ioc0: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=201 Nov 14 09:51:18 filer1 kernel: scsi3 : ioc1: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=209 Nov 14 09:51:22 filer1 kernel: scsi4 : ioc2: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=50 Nov 14 09:51:26 filer1 kernel: scsi5 : ioc3: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=58 Nov 14 09:51:27 filer1 kernel: Vendor: BROWNIE Model: 8600U3 Rev: 0001 Nov 14 09:51:27 filer1 kernel: Type: Direct-Access ANSI SCSI revision: 03 Nov 14 09:52:59 filer1 kernel: SCSI device sda: 3417931776 512-byte hdwr sectors (1749981 MB) Nov 14 09:53:04 filer1 kernel: mptbase: ioc3: IOCStatus(0x0047): SCSI Protocol Error
Nov 14 09:53:27 filer1 last message repeated 4 times
Nov 14 09:53:27 filer1 kernel: SCSI device sda: drive cache: write back
Nov 14 09:53:27 filer1 kernel:  /dev/scsi/host5/bus0/target8/lun0: p1
Nov 14 09:53:27 filer1 kernel: Attached scsi disk sda at scsi5, channel 0, id 8, lun 0


and last, kernel logs for another BR8600 attached to a server running
kernel 2.4.31 (with an Adaptec SCSI Host adapter) :

Oct 16 17:14:01 mpopu kernel: scsi1 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.10
Oct 16 17:14:01 mpopu kernel:         <Adaptec AIC7902 Ultra320 SCSI adapter>
Oct 16 17:14:01 mpopu kernel:         aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI 33 or 66Mhz, 512 SCBs
Oct 16 17:14:01 mpopu kernel:
Oct 16 17:14:01 mpopu kernel: scsi2 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.10
Oct 16 17:14:01 mpopu kernel:         <Adaptec AIC7902 Ultra320 SCSI adapter>
Oct 16 17:14:01 mpopu kernel:         aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI 33 or 66Mhz, 512 SCBs
Oct 16 17:14:01 mpopu kernel:
Oct 16 17:14:01 mpopu kernel: blk: queue f63fca18, I/O limit 4095Mb (mask 0xffffffff)
Oct 16 17:14:01 mpopu kernel: scsi1:A:8:0: DV failed to configure device. Please file a bug report against this driver.
Oct 16 17:14:01 mpopu kernel: (scsi1:A:8): 160.000MB/s transfers (80.000MHz DT,
16bit)
Oct 16 17:14:01 mpopu kernel:   Vendor: BROWNIE   Model: 8600U3            Rev:
0001
Oct 16 17:14:01 mpopu kernel:   Type:   Direct-Access                      ANSI
SCSI revision: 03
Oct 16 17:14:01 mpopu kernel: blk: queue f63fc818, I/O limit 4095Mb (mask 0xffffffff)
Oct 16 17:14:01 mpopu kernel: Attached scsi disk sdb at scsi1, channel 0, id 8,
lun 0
Oct 16 17:14:01 mpopu kernel: SCSI device sdb: 3417931776 512-byte hdwr sectors
(1749981 MB)
Oct 16 17:14:01 mpopu kernel:  sdb: sdb1


--
		Etienne Vogt (Etienne.Vogt@xxxxxxxx)
		Observatoire de Paris-Meudon, France
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux