I have a rather busy MythTV system here, with four tuners and a hirez HDTV for the display. It uses a pair of 750GB Hitachi SATA drives (RAID0) for storage. I wanted to see how warm the drives get, so I set up a monitoring program that invokes hddtemp every 20-30 seconds or so, updating a front panel display with the current drive temperature of /dev/sdb. So far, so good. But.. when the machine is busy recording a hi-def (17mbit/sec) stream whilst also playing back a hi-def stream, libata locks up and resets /dev/sdb periodically, say once every minute or so (quite irregular). This causes lots of recording bits to be dropped, ruining later playback. The dual-core system was using 2.6.24.3 (32-bit) at the time, and libata.ahci w/NCQ on both drives. I retested with "hdparm -Q1 /dev/sd?" and it didn't help -- same problem. Looking at the system logs, and running a full S.M.A.R.T. test shows both drives to be clean (no media faults found), other than libata reporting timeouts as here: .. ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd ec/00:00:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in res 40/00:00:00:4f:c2/00:01:00:00:00/00 Emask 0x4 (timeout) ata2.00: status: { DRDY } ata2: soft resetting link ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata2.00: configured for UDMA/133 ata2: EH complete sd 1:0:0:0: [sdb] 1465149168 512-byte hardware sectors (750156 MB) sd 1:0:0:0: [sdb] Write Protect is off sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA .. That's an IDENTIFY (0xEC) command timing out. The hddtemp program does it's work by issuing IDENTIFY and SMART commands to the target drive, /dev/sdb in this case. ioctl(3, 0x30d, 0xbfd2c418) ioctl(3, 0x31f, 0xbfd2c60c) ioctl(3, 0x31f, 0xbfd2c614) ioctl(3, 0x31f, 0xbfd2c408) So that 0xEC most likely came from the hddtemp program, since libata doesn't normally issue them after probing. So why is it timing out? Well, these drives have 32MB onboard caches, and I'm guessing that something (firmware, whatever) tries to empty that cache before processing the issued IDENTIFY command. And we time out before the drive has a chance to actually process the IDENTIFY. This is a problem for libata, one that will become more common as drives get larger and larger caches. Fiddling with timeouts isn't really the greatest solution, though it could help. A more deterministic solution might be to issue a CACHE FLUSH command ahead of any ATA_16 command (perhaps other than a R/W command?), so that the timeout for the ATA_16 (IDENTIFY in this case) won't have to account for an unpredictable write cache on the drive. Or maybe there's a better way. Suggestions ?? Meanwhile, I'm installing 2.6.26.2 on the box later today, but don't really expect anything to be different with the newer kernel. -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html