Hello Gwendal, Thank you for your kindness response. (2012/03/26 00:28), Gwendal Grignou wrote: > I reread your logs. > > Assuming you don't mind long boot from cold power, the remaining > problem is with the 4 disk enclosures [ata7-ata10] on the second > machines where the first disk is not found and boot from warm reboot > is very long. Not only first disk. 2 or more HDDs are missed for every PMP. > I try to understand why it works with the other 4 enclosures > [ata5-ata6] on the first and second machines. > > Also, just to be sure I understand you configuration correctly, your > second machine has 30 disks total, not 40: > 2 direct on ata1.00 and ata1.01 > 8 on 2 enclosures [ 2 * 4] on ata5 and ata6 > 20 on 4 enclosures [ 4 * 5] on ata7 - ata10 Oops! You are right. I'm very sorry! > Also, from the log, ata5 and ata6 is behind a Sil3132 based > controller, while ata7-ata10 behind a single Sil3124, not the opposite > as you said in a precedent mail. > > If possible, could you switch 2 of the 4 enclosures [with their disks] > that fails to the port controlled by the Sil3132 controller, reboot > the machine with all its 30 drives and see if the failures follow the > controller or the enclosure. Yes, I will do it as soon as possible. (Sorry, resyncing is runnning now.) > If you based your raid configuration on signature that should be fine, > but if it based on kernel device name [sdX] that will confuse md and > will mess with your data. The problem is that some HDDs on every PMP from ata7 - ata10 are missed. The RAID problem seems to be caused by it. mdadm.conf uses uuid. So I think that kernel uses uuids of RAIDs. Best Regards, Akira > I am sorry I don't have any other suggestion right now, The HDDs connected to ata7 -- ata10 are very old and support only Serial ATA 1.0a. I checked data sheet and chip(JM20330) supports SRST command. While booting, indicator LED brinks repeatedly. And more than half of HDDs are identified. So, I have thought that the HDD side is not a problem. How is your opinion about it? > Regards, > Gwendal. > > On Sat, Mar 24, 2012 at 6:19 PM, ANEZAKI, Akira > <fireblade1230@xxxxxxxxxxx> wrote: >> Hello Gwendal, >> >> I want to confirm one thing. >> The kernel 3.1.x driver still works? >> >> It seems to take long time to solve the problem. Of course I understand >> staggered spin-up is better solution. But I can't wait it so long. And >> it affects only SiI3726 only. >> >> Best Regards, >> Akira >> >> (2012/03/23 18:59), ANEZAKI, Akira wrote: >>> Hello Gwendal, >>> >>> (2012/03/23 17:31), Gwendal Grignou wrote: >>>>>>>> I notice however some messages I did not see before: >>>>>>>>>> [ 4.856382] ata7.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9 >>>>>>>>>> [ 4.858742] ata7.00: hard resetting link >>>>>>>>>> [ 14.843039] ata7.00: softreset failed (timeout) >>>>>>>>>> [ 17.836402] ata7.15: qc timeout (cmd 0xe4) >>>>>>>> The later indicates that the PMP is stuck and the host can not read >>>>>>>> its internal register. >>>>>>>> Is it possible that the PMP in these 4 enclosures you are using have a >>>>>>>> different firmware than the other ones? >>>>>>>> Firmware 1.0114 is available at: >>>>>>>> http://www.siliconimage.com/support/searchresults.aspx?pid=26&cat=23 >>>>>>>> >>>>>>>> From the release notes: >>>>>>>> """- Fix SRST and initial two RegFIS Problem.""" >>>>> >>>>> I'm still fixing broken RAID. Sorry for my slow response. >>> >>> I checked those firmware version. All of them use version 1.0114. >>> >>> Best Regards, >>> Akira >> -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html