On Mon, Dec 20, 2010 at 1:06 PM, Bruno Prémont <bonbons@xxxxxxxxxxxxxxxxx> wrote: > Hi, > > [ccing linux-ide] > > Please provide the part of kernel log showing initialization of your > disk controller(s) as well as detection of all the discs. > Verbose lspci output for the disc controller and $(smartctl -i -A $disk) > output might be useful as well. > > Did you try the individual discs on a completely different system (e.g. > plain desktop system) and what revision of SATA are both components > supporting? > > Bruno > > > On Mon, 20 December 2010 Rogier Wolff <R.E.Wolff@xxxxxxxxxxxx> wrote: >> Hi, >> >> A friend of mine has a server in a datacenter somewhere. His machine >> is not working properly: most of his disks take 10-100 times longer >> to process each IO request than normal. >> >> iostat -kx 10 output: >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util >> sdd 0.30 0.00 0.40 1.20 2.80 1.10 4.88 0.43 271.50 271.44 43.43 >> >> shows that in this 10 second period, the disk was busy for 4.3 seconds >> and serviced 15-16 requests during that time. >> >> Normal disks show "svctm" of around 10-20ms. >> >> Now you might say: It's his disk that's broken. >> Well no: I don't believe that all four of his disks are broken. >> (I just showed you output about one disk, but there are 4 disks in there >> all behaving similar, but some are worse than others.) >> >> Or you might say: It's his controller that's broken. So we thought >> too. We replaced the onboard sata controller with a 4-port sata >> card. Now they are running off the external sata card... Slightly >> better, but not by much. >> >> Or you might say: it's hardware. But suppose the disk doesn't properly >> transfer the data 9 times out of 10, wouldn't the driver tell us >> SOMETHING in the syslog that things are not fine and dandy? Moreover, >> In the case above, 12kb were transferred in 4.3 seconds. If CRC errors >> were happening, the interface would've been able to transfer over >> 400Mb during that time. So every transfer would need to be retried on >> average 30000 times... Not realistic. If that were the case, we'd >> surely hit a maximum retry limit every now and then? >> >> >> These syptoms started when the system was running 2.6.33, but are >> still present now the system has been upgraded to 2.6.36. >> >> Is there anything you can suggest to get to the root of this problem? >> Could this be a software issue with the driver? Can we enable some >> driver debugging to find out what is wrong? >> >> Any help will be appreciated. >> >> Roger. My personal guess would definitely be hardware. The only common component I can think of is power. SATA is very sensitive to requiring high-quality power. Much more so than IDE. Greg -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html