Re: Marvel 88SE6121 fails with SATA-2/3 HDDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/2/24 18:32, Hajo Noerenberg wrote:
> On 13.02.2023 at 02:28 Damien Le Moal wrote:
>> On 2/1/23 19:02, Hajo Noerenberg wrote:
>>> Am 31.01.2023 um 03:34 schrieb Damien Le Moal:
>>>> On 1/30/23 22:40, Hajo Noerenberg wrote
>>>>> Summary: With U-Boot and kernels <3.16 the drives work, even without jumper.
>>>>> I wonder if there is a way to get the drives working with up to date kernels.
>>>>> This would have the benefit of a.) no need to set jumpers and b.) getting
>>>>> bigger/newer drives like the WD30EFRX to work which probably do not have a
>>>>> downgrade-jumper.
>>>>
> 
> Sorry to reactivate this old thread, but it took me a really long time to
> find out anything of substance.
> 
> Just to summerize again: Gen2/3 HDDs only work with the 88SE6121 controller
> in the Seagate Blackarmor NAS 440 [1] if they are jumpered to Gen1 (1.5 Gbit/s).
> This is unsatisfactory because they correctly work with the U-Boot bootloader
> without any jumpers at Gen2 speed (3 Gbit/s).
> 
> 
>>> I forgot to mention the main benefit: Without the "downgrade-jumper" the drives are able to run at SATA-2 speed (the 88SE6121 is a SATA-2 controller). At least with kernel 2.6.x (ahci module) one can see the ST3500418AS running at 3Gbps:
>>>
>>> [  151.957573] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>>> [  151.958713] ata1.00: ATA-8: ST3500418AS, CC38, max UDMA/133
>>> [  151.958726] ata1.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 0/32)
>>> [  151.960062] ata1.00: configured for UDMA/133
>>> [  151.960397] scsi 0:0:0:0: Direct-Access     ATA      ST3500418AS      CC38 PQ: 0 ANSI: 5
>>>
>>> And with kernel 2.6.x even the SATA-3 WD30EFRX runs at 3Gbps as well (no jumper, no kernel option) and has full 3TB accessible:
>>>
>>> [  100.497589] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>>> [  100.498145] ata1.00: HPA detected: current 5860531055, native 5860533168
>>> [  100.498165] ata1.00: ATA-9: WDC WD30EFRX-68EUZN0, 80.00A80, max UDMA/133
>>> [  100.498177] ata1.00: 5860531055 sectors, multi 0: LBA48 NCQ (depth 0/32)
>>> [  100.498853] ata1.00: configured for UDMA/133
>>> [  100.499187] scsi 0:0:0:0: Direct-Access     ATA      WDC WD30EFRX-68E 80.0 PQ: 0 ANSI: 5
>>>
>>>> Can you try with libata.force=nolpm ? A lot of old WD drives have broken LPM.
>>>>
>>>
>>> libata.force=nolpm slightly changes the kernel log: the drive is basically detected (the model name and drive geometry show up), but in the end it fails:
>>>
> 
> After many many tests I can say that no kernel option I tried (e.g. libata.force with
> nolpm, noncq, nodma, 1.5Gbps and almost all others) helps to mitigate the problem.
> 
> By chance I saw an old Debian kernel patch [2], which, when applied make Gen2
> HDDs reproducibly work with 3.x kernels. After some more investigation
> I figured out that similarly commenting out some lines in the interrupt handler in
> libahci.c causes them to be recognized with kernel 6.x as well:
> 
> /*      if (sata_lpm_ignore_phy_events(&ap->link)) {
>                 status &= ~PORT_IRQ_PHYRDY;
>                 ahci_scr_write(&ap->link, SCR_ERROR, SERR_PHYRDY_CHG);
>         }
> */
> 
> Interestingly, sata_lpm_ignore_phy_events() returns false in my setup. So, as far as
> I can tell, it is not a question of the ahci_scr_write() being executed. Rather, it
> is the CPU cycles that are saved by the absence of this section in the interrupt
> handler. At first it was very hard for me to believe that it was due to commenting
> out the section, but I have compiled several kernels that differ
> only in this section: yes, it makes a difference.

That is very odd. sata_lpm_ignore_phy_events() is only a couple of "if"
statements and there are no register accesses in there. So if the few CPU cycles
that takes make a difference, I would suspect that there is something odd going
on with the marvell adapter interrupts.

> To summerize, with sata_lpm_ignore_phy_events() commented out:
> 
> - with kernel 3.x HDDs are recognized (IDENTIFY 0xEC) and one can write large
>   amounts of data to them without any problems.
> - for kernel 6.x identifying and writing data works "almost" every time but not
>   perfectly stable.

So commenting out that "if (sata_lpm_ignore_phy_events)" hunk is not enough to
fix your issue then. This hunk may not be directly related to the issue and
commenting it out simply changes the timing making things better.

> - for both 3.x and 6.x kernels, when I execute certain special commands
>   (e.g. "hdparm -I"), the drive connection is reset but usually works afterwards.
> - with kernel 2.x the hard disks always worked, which is reasonable, because there
>   the interrupt handler never included a sata_lpm_ignore_phy_events() call.

But above, you said that things are not completely stable with 6.x. So there is
likely something else going on.

> I would be thankful if you could tell me whether and how this problem can be
> solved sustainably.

First things first: can you please test with the latest mainline 6.10-rc6 kernel
and send a dmesg output after boot and any other relevant output showing
problems when doing IOs ?

> 
> Hajo
> 
> 
> [1] https://github.com/hn/seagate-blackarmor-nas?tab=readme-ov-file#nas-440-patch-details
> [2] https://salsa.debian.org/kernel-team/linux/-/blob/debian/3.16.39-1_bpo70+1/debian/patches/debian/revert-libata-ignore-spurious-phy-event-on-lpm-polic.patch?ref_type=tags
> 
> 

-- 
Damien Le Moal
Western Digital Research





[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux