aacraid woes with kernel 4.14.48 to 4.16.14

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

Running plain vanilla kernels on Debian stable, on 2 different machines
with different hardware setups: 

 * production system: EPYC 16 cores, 64 GB RAM, ASR8805 (latest
   firmware): doesn't work at all with 4.14.48 and up
 * test system : Opteron dual core, 2GB RAM, ASR7805, mostly works, but
   problems with 4.16.14

Both system run fine with kernel up to 4.13.16 (plain vanilla). With
4.14.48 and up, the aacraid driver doesn't work on the EPYC system. From
dmesg:

[   61.069190] Adaptec aacraid driver 1.2.1[50834]-custom
[   61.069527] aacraid 0000:21:00.0: SME is active, device will require DMA bounce buffers
[   61.076949] SME is active and system is using DMA bounce buffers
[   61.076954] aacraid: Comm Interface type2 enabled


Nothing else happens. Attached devices are unavailable. "arcconf" find
no controller. "rmmod aacraid" doesn't work (device in use). 

Running kernel 4.16.14 is exactly the same:

[   40.380488] Adaptec aacraid driver 1.2.1[50877]-custom
[   40.380871] aacraid 0000:21:00.0: SME is active, device will require DMA bounce buffers
[   40.388991] SME is active and system is using DMA bounce buffers
[   40.388995] aacraid: Comm Interface type2 enabled

Contrast with kernel 4.13.16 (dmesg ) :

   25.286437] Adaptec aacraid driver 1.2.1[50834]-custom
[   25.293694] aacraid: Comm Interface type2 enabled
[   25.300799] AAC0: kernel 7.13-0[33263] Mar 16 2018
[   25.300801] AAC0: monitor 7.13-0[33263]
[   25.300802] AAC0: bios 7.13-0[33263]
[   25.300804] AAC0: serial 6A46639462D
[   25.300804] AAC0: Non-DASD support enabled.
[   25.300805] AAC0: 64bit support enabled.
[   25.300807] aacraid 0000:21:00.0: 64 Bit DAC enabled
...
[   27.307013] scsi host16: aacraid
[   27.307228] scsi 16:0:0:0: Direct-Access     ASR8805  LogicalDrv 0     V1.0 PQ: 0 ANSI: 2
[   27.307364] sd 16:0:0:0: [sdm] Very big device. Trying to use READ CAPACITY(16).
[   27.307376] sd 16:0:0:0: [sdm] 11714670592 512-byte logical blocks: (6.00 TB/5.45 TiB)
[   27.307385] sd 16:0:0:0: [sdm] Write Protect is off
[   27.307386] sd 16:0:0:0: [sdm] Mode Sense: 12 00 10 08
[   27.307403] sd 16:0:0:0: [sdm] Write cache: disabled, read cache: enabled, supports DPO and FUA
[   27.307411] sd 16:0:0:0: Attached scsi generic sg12 type 0
[   27.307507] sd 16:0:0:0: [sdm] Very big device. Trying to use READ CAPACITY(16).
[   27.307830] sd 16:0:0:0: [sdm] Very big device. Trying to use READ CAPACITY(16).
[   27.307855] sd 16:0:0:0: [sdm] Attached SCSI removable disk
[   27.332731] scsi 16:1:0:0: Direct-Access     ATA      WDC WD10TPVT-00U 1A01 PQ: 1 ANSI: 6
[   27.355431] scsi 16:1:0:0: Attached scsi generic sg13 type 0
[   27.355830] scsi 16:1:1:0: Direct-Access     ATA      WDC WD10TPVT-00U 1A01 PQ: 1 ANSI: 6
[   27.385377] scsi 16:1:1:0: Attached scsi generic sg14 type 0
[   27.385788] scsi 16:1:2:0: Direct-Access     ATA      WDC WD10TPVT-00U 1A01 PQ: 1 ANSI: 6
[   27.413412] scsi 16:1:2:0: Attached scsi generic sg15 type 0
[   27.413813] scsi 16:1:3:0: Direct-Access     ATA      WDC WD10TPVT-00U 1A01 PQ: 1 ANSI: 6
[   27.437420] scsi 16:1:3:0: Attached scsi generic sg16 type 0
[   27.437836] scsi 16:1:4:0: Direct-Access     ATA      WDC WD10TPVT-00H 1A01 PQ: 1 ANSI: 6
[   27.636675] scsi 16:1:4:0: Attached scsi generic sg17 type 0
[   27.637077] scsi 16:1:5:0: Direct-Access     ATA      WDC WD10TPVT-00U 1A01 PQ: 1 ANSI: 6
[   27.664591] scsi 16:1:5:0: Attached scsi generic sg18 type 0
[   27.664996] scsi 16:1:6:0: Direct-Access     ATA      WDC WD10TPVT-00U 1A01 PQ: 1 ANSI: 6
[   27.697619] scsi 16:1:6:0: Attached scsi generic sg19 type 0
[   27.698027] scsi 16:1:7:0: Direct-Access     ATA      WDC WD10TPVT-00U 1A01 PQ: 1 ANSI: 6
[   27.732645] scsi 16:1:7:0: Attached scsi generic sg20 type 0

And 4.9.108 (different driver but same output):

[   11.220997] Adaptec aacraid driver 1.2-1[41066]-ms
[   11.225669] AAC0: kernel 7.13-0[33263] Mar 16 2018
[   11.225671] AAC0: monitor 7.13-0[33263]
[   11.225672] AAC0: bios 7.13-0[33263]
[   11.225674] AAC0: serial 6A46639462D
[   11.225674] AAC0: Non-DASD support enabled.
[   11.225675] AAC0: 64bit support enabled.
[   11.225675] AAC0: 64 Bit DAC enabled
[   13.220045] scsi host16: aacraid
[   13.220082] aacraid 0000:21:00.0: DDR cache data recovered successfully
[   13.220286] scsi 16:0:0:0: Direct-Access     ASR8805  LogicalDrv 0     V1.0 PQ: 0 ANSI: 2
[   13.220371] sd 16:0:0:0: [sdm] Very big device. Trying to use READ CAPACITY(16).
[   13.220385] sd 16:0:0:0: [sdm] 11714670592 512-byte logical blocks: (6.00 TB/5.45 TiB)
[   13.220393] sd 16:0:0:0: [sdm] Write Protect is off
[   13.220395] sd 16:0:0:0: [sdm] Mode Sense: 12 00 10 08
[   13.220415] sd 16:0:0:0: [sdm] Write cache: disabled, read cache: enabled, supports DPO and FUA
[   13.220499] sd 16:0:0:0: [sdm] Very big device. Trying to use READ CAPACITY(16).
[   13.220737] sd 16:0:0:0: [sdm] Very big device. Trying to use READ CAPACITY(16).
[   13.220774] sd 16:0:0:0: [sdm] Attached SCSI removable disk
[   13.221263] random: fast init done
[   13.280269] scsi 16:1:0:0: Direct-Access     ATA      WDC WD10TPVT-00U 1A01 PQ: 1 ANSI: 6
[   13.306349] scsi 16:1:1:0: Direct-Access     ATA      WDC WD10TPVT-00U 1A01 PQ: 1 ANSI: 6
[   13.337330] scsi 16:1:2:0: Direct-Access     ATA      WDC WD10TPVT-00U 1A01 PQ: 1 ANSI: 6
[   13.380291] scsi 16:1:3:0: Direct-Access     ATA      WDC WD10TPVT-00U 1A01 PQ: 1 ANSI: 6
[   13.518301] scsi 16:1:4:0: Direct-Access     ATA      WDC WD10TPVT-00H 1A01 PQ: 1 ANSI: 6
[   13.728354] scsi 16:1:5:0: Direct-Access     ATA      WDC WD10TPVT-00U 1A01 PQ: 1 ANSI: 6
[   13.756322] scsi 16:1:6:0: Direct-Access     ATA      WDC WD10TPVT-00U 1A01 PQ: 1 ANSI: 6
[   13.791346] scsi 16:1:7:0: Direct-Access     ATA      WDC WD10TPVT-00U 1A01 PQ: 1 ANSI: 6

Apparently the problem may lie in the "SME" and "bounce buffers",
however I don't know for sure if the SME and bounce buffers (whatever
they are) were disabled in 4.13 and prior kernels, or if it's simply the
warning that was enabled at some point.

Now to the older machine, little RAM, older RAID controller the
situation is slightly different:

4.14.48 and 4.15.18 run both fine:

[    6.066699] Adaptec aacraid driver 1.2.1[50834]-custom
[    6.067097] aacraid: Comm Interface type2 enabled
[    6.085979] AAC0: kernel 7.5-0[32118] Mar 31 2018
[    6.085981] AAC0: monitor 7.5-0[32118]
[    6.085982] AAC0: bios 7.5-0[32118]
[    6.085984] AAC0: serial 5C1113B8ADE
[    6.085985] AAC0: Non-DASD support enabled.
[    6.089604] random: fast init done
[    6.099792] scsi host6: aacraid
[    6.100034] scsi 6:0:0:0: Direct-Access     ASR7805  yrdy             V1.0 PQ: 0 ANSI: 2
[    6.100261] sd 6:0:0:0: [sda] 3900682240 512-byte logical blocks: (2.00 TB/1.82 TiB)
[    6.100276] sd 6:0:0:0: [sda] Write Protect is off
[    6.100278] sd 6:0:0:0: [sda] Mode Sense: 12 00 10 08
[    6.100296] sd 6:0:0:0: [sda] Write cache: disabled, read cache: enabled, supports DPO and FUA
[    6.123005] scsi 6:1:4:0: Direct-Access     ATA      Hitachi HDS72302 MN6O PQ: 1 ANSI: 6
[    6.125896] sd 6:0:0:0: [sda] Attached SCSI removable disk

However with 4.16.14 the controller hangs and resets constantly:

[   78.830864] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (6,0,0,0):
[   78.835871] aacraid: Host adapter reset request. SCSI hang ?
[   94.181485] aacraid: Host adapter reset request. SCSI hang ?
[   94.181493] aacraid 0000:01:00.0: outstanding cmd: midlevel-0
[   94.181494] aacraid 0000:01:00.0: outstanding cmd: lowlevel-0
[   94.181495] aacraid 0000:01:00.0: outstanding cmd: error handler-0
[   94.181497] aacraid 0000:01:00.0: outstanding cmd: firmware-1
[   94.181498] aacraid 0000:01:00.0: outstanding cmd: kernel-0
[   94.181514] aacraid 0000:01:00.0: Controller reset type is 3
[   94.181515] aacraid 0000:01:00.0: Issuing IOP reset
[  120.593352] aacraid 0000:01:00.0: IOP reset succeded
[  120.605025] aacraid: Comm Interface type2 enabled
[  121.660829] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
[  121.660976] NFSD: starting 90-second grace period (net f0000099)
[  128.984735] mgag200 0000:08:04.0: Video card doesn't support cursors with partial transparency.
[  128.984738] mgag200 0000:08:04.0: Not enabling hardware cursor.
[  133.824432] aacraid 0000:01:00.0: Scheduling bus rescan
[  193.523141] aacraid: Host adapter reset request. SCSI hang ?
[  193.523150] aacraid 0000:01:00.0: outstanding cmd: midlevel-0
[  193.523152] aacraid 0000:01:00.0: outstanding cmd: lowlevel-0
[  193.523153] aacraid 0000:01:00.0: outstanding cmd: error handler-0
[  193.523154] aacraid 0000:01:00.0: outstanding cmd: firmware-1
[  193.523155] aacraid 0000:01:00.0: outstanding cmd: kernel-0
[  193.523178] aacraid 0000:01:00.0: Controller reset type is 3
[  193.523179] aacraid 0000:01:00.0: Issuing IOP reset
[  219.910222] aacraid 0000:01:00.0: IOP reset succeded
[  219.915742] aacraid: Comm Interface type2 enabled
[  233.140378] aacraid 0000:01:00.0: Scheduling bus rescan
[  246.771302] INFO: task kworker/u24:3:79 blocked for more than 120
seconds.


-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@xxxxxxxxxxxxxx>
                    |   +33 1 78 94 84 02

Attachment: pgpq2nxbUH_xZ.pgp
Description: Signature digitale OpenPGP


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux