Re: Marvel 88SE6121 fails with SATA-2/3 HDDs

Hajo Noerenberg <hajo-linux-ide@xxxxxxxxxxxxx> · Tue, 2 Jul 2024 11:32:21 +0200

On 13.02.2023 at 02:28 Damien Le Moal wrote:
> On 2/1/23 19:02, Hajo Noerenberg wrote:
>> Am 31.01.2023 um 03:34 schrieb Damien Le Moal:
>>> On 1/30/23 22:40, Hajo Noerenberg wrote
>>>> Summary: With U-Boot and kernels <3.16 the drives work, even without jumper.
>>>> I wonder if there is a way to get the drives working with up to date kernels.
>>>> This would have the benefit of a.) no need to set jumpers and b.) getting
>>>> bigger/newer drives like the WD30EFRX to work which probably do not have a
>>>> downgrade-jumper.
>>>

Sorry to reactivate this old thread, but it took me a really long time to
find out anything of substance.

Just to summerize again: Gen2/3 HDDs only work with the 88SE6121 controller
in the Seagate Blackarmor NAS 440 [1] if they are jumpered to Gen1 (1.5 Gbit/s).
This is unsatisfactory because they correctly work with the U-Boot bootloader
without any jumpers at Gen2 speed (3 Gbit/s).

>> I forgot to mention the main benefit: Without the "downgrade-jumper" the drives are able to run at SATA-2 speed (the 88SE6121 is a SATA-2 controller). At least with kernel 2.6.x (ahci module) one can see the ST3500418AS running at 3Gbps:
>>
>> [  151.957573] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> [  151.958713] ata1.00: ATA-8: ST3500418AS, CC38, max UDMA/133
>> [  151.958726] ata1.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 0/32)
>> [  151.960062] ata1.00: configured for UDMA/133
>> [  151.960397] scsi 0:0:0:0: Direct-Access     ATA      ST3500418AS      CC38 PQ: 0 ANSI: 5
>>
>> And with kernel 2.6.x even the SATA-3 WD30EFRX runs at 3Gbps as well (no jumper, no kernel option) and has full 3TB accessible:
>>
>> [  100.497589] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> [  100.498145] ata1.00: HPA detected: current 5860531055, native 5860533168
>> [  100.498165] ata1.00: ATA-9: WDC WD30EFRX-68EUZN0, 80.00A80, max UDMA/133
>> [  100.498177] ata1.00: 5860531055 sectors, multi 0: LBA48 NCQ (depth 0/32)
>> [  100.498853] ata1.00: configured for UDMA/133
>> [  100.499187] scsi 0:0:0:0: Direct-Access     ATA      WDC WD30EFRX-68E 80.0 PQ: 0 ANSI: 5
>>
>>> Can you try with libata.force=nolpm ? A lot of old WD drives have broken LPM.
>>>
>>
>> libata.force=nolpm slightly changes the kernel log: the drive is basically detected (the model name and drive geometry show up), but in the end it fails:
>>

After many many tests I can say that no kernel option I tried (e.g. libata.force with
nolpm, noncq, nodma, 1.5Gbps and almost all others) helps to mitigate the problem.

By chance I saw an old Debian kernel patch [2], which, when applied make Gen2
HDDs reproducibly work with 3.x kernels. After some more investigation
I figured out that similarly commenting out some lines in the interrupt handler in
libahci.c causes them to be recognized with kernel 6.x as well:

/*      if (sata_lpm_ignore_phy_events(&ap->link)) {
                status &= ~PORT_IRQ_PHYRDY;
                ahci_scr_write(&ap->link, SCR_ERROR, SERR_PHYRDY_CHG);
        }
*/

Interestingly, sata_lpm_ignore_phy_events() returns false in my setup. So, as far as
I can tell, it is not a question of the ahci_scr_write() being executed. Rather, it
is the CPU cycles that are saved by the absence of this section in the interrupt
handler. At first it was very hard for me to believe that it was due to commenting
out the section, but I have compiled several kernels that differ
only in this section: yes, it makes a difference.

To summerize, with sata_lpm_ignore_phy_events() commented out:

- with kernel 3.x HDDs are recognized (IDENTIFY 0xEC) and one can write large
  amounts of data to them without any problems.
- for kernel 6.x identifying and writing data works "almost" every time but not
  perfectly stable.
- for both 3.x and 6.x kernels, when I execute certain special commands
  (e.g. "hdparm -I"), the drive connection is reset but usually works afterwards.
- with kernel 2.x the hard disks always worked, which is reasonable, because there
  the interrupt handler never included a sata_lpm_ignore_phy_events() call.

I would be thankful if you could tell me whether and how this problem can be
solved sustainably.

Hajo

[1] https://github.com/hn/seagate-blackarmor-nas?tab=readme-ov-file#nas-440-patch-details
[2] https://salsa.debian.org/kernel-team/linux/-/blob/debian/3.16.39-1_bpo70+1/debian/patches/debian/revert-libata-ignore-spurious-phy-event-on-lpm-polic.patch?ref_type=tags