2017-03-28 2:53 GMT+02:00 Matthias Peter Walther <m_walt11@xxxxxxxxxxxxxxx>: > Hello, > > I'm new to this list and I signed up, because I found an instability > with the following sata controller: > > Product name: Delock 89384 10 Port PCIe 2.0 x2 Low Profile retail > Identifies as: 03:00.0 SATA controller: ASMedia Technology Inc. Device > 0625 (rev 01) > (PCIe to 10x sata controller card) > > Problem description: The controller works and recognizes all my drives. > But under heavy load, e. g. a mdadm raid-6 resync or just a dd to a > file, it keeps causing lockups and random device link resettings on > multiple devices. > > I spend the last two weeks on replacing components in this server, the > controller is definitely the problem. Everything works fine with a > Marvell 9215 controller and I tried the controller with three different > mainboards and kernel versions 3.2, 4.4 and 4.10. The controller or its > kernel driver definitely cause these lock ups. I made sure, that all > drives were properly connected. [Syslog attached at the bottom of this > mail.] > > As I am an experienced linux user, but new to this, first questions: > > Is this the right place to seek for help? > > If not so: Where might I get help with this? This should go to linux-ide instead of linux-scsi. Maybe the maintainer could give you some help. You can also take a look at https://ata.wiki.kernel.org/index.php/Libata_error_messages > > If so: Does anybody have an idea, what might causes this problem. > > My abilities: I can test patches on the mainline kernel. I can't code, > as I lack any kind of knowledge about the sata standard. I have the > controller card and an empty spare device here, to run any kind of tests. > > Syslog of one of these resets: If the level of stress is high enough, > they happen on all connected devices (seemingly random) from different > manufacturers (WesternDigital and Seagate) and different types of > models. So this is probably not a bug in the firmware of one of the drives. > > Log: > Mar 24 09:01:43 Server1 kernel: [ 1807.338347] ata3.00: exception Emask > 0x0 SAct 0x0 SErr 0x0 action > 0x6 frozen > Mar 24 09:01:43 Server1 kernel: [ 1807.340701] ata3.00: failed command: > FLUSH CACHE EXT > Mar 24 09:01:43 Server1 kernel: [ 1807.343078] ata3.00: cmd > ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 > Mar 24 09:01:43 Server1 kernel: [ 1807.343078] res > 40/00:00:00:00:00/00:00:00:00:00/00 Emask > 0x4 (timeout) > Mar 24 09:01:43 Server1 kernel: [ 1807.349717] ata3.00: status: { DRDY } > Mar 24 09:01:43 Server1 kernel: [ 1807.353029] ata3: hard resetting link > Mar 24 09:01:43 Server1 kernel: [ 1807.665533] ata3: SATA link up 6.0 > Gbps (SStatus 133 SControl 300) > Mar 24 09:01:43 Server1 kernel: [ 1807.667000] ata3.00: configured for > UDMA/133 > Mar 24 09:01:43 Server1 kernel: [ 1807.667007] ata3.00: retrying FLUSH > 0xea Emask 0x4 > Mar 24 09:01:43 Server1 kernel: [ 1807.667164] ata3.00: device reported > invalid CHS sector 0 > Mar 24 09:01:43 Server1 kernel: [ 1807.667183] ata3: EH complete > > Whenever such a lock up happens, the whole partition is not read or > writeable for at least 90 seconds and sometimes several minutes. But the > system never crashed. I tried to google the controller card, didn't find > much about it. > > Any advice would be much appreciated :). > > Greetings, > Matthias > > Cheers, Jack -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html