Re: MD/RAID time out writing superblock

Tejun Heo <tj@xxxxxxxxxx> · Mon, 31 Aug 2009 17:10:43 +0900

Ric Wheeler wrote:
> On 08/27/2009 05:22 PM, Andrei Tanas wrote:
>> Hello,
>>
>> This is about the same problem that I wrote two days ago (md gets an
>> error
>> while writing superblock and fails a hard drive).
>>
>> I've tried to figure out what's really going on, and as far as I can
>> tell,
>> the disk doesn't really fail (as confirmed by multiple tests), it
>> times out
>> trying to execute ATA_CMD_FLUSH_EXT ("at2.00 cmd ea..." in the log)
>> command. The reason for this I believe is that md_super_write queues the
>> write comand with BIO_RW_SYNCIO flag.
>> As I wrote before, with 32MB cache it is conceivable that it will take
>> the
>> drive longer than 30 seconds (defined by SD_TIMEOUT in scsi/sd.h) to
>> flush
>> its buffers.
>>
>> Changing safe_mode_delay to more conservative 2 seconds should definitely
>> help, but is it really necessary to write the superblock synchronously
>> when
>> array changes status from active to active-idle?
>>
>> [90307.328266] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
>> frozen
>> [90307.328275] ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
>> [90307.328277]          res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4
>> (timeout)
>> [90307.328280] ata2.00: status: { DRDY }
>> [90307.328288] ata2: hard resetting link
>> [90313.218511] ata2: link is slow to respond, please be patient (ready=0)
>> [90317.377711] ata2: SRST failed (errno=-16)
>> [90317.377720] ata2: hard resetting link
>> [90318.251720] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> [90318.338026] ata2.00: configured for UDMA/133
>> [90318.338062] ata2: EH complete
>> [90318.370625] end_request: I/O error, dev sdb, sector 1953519935
>> [90318.370632] md: super_written gets error=-5, uptodate=0
>>
>>    
> 
> 30 seconds is a very long time for a drive to respond, but I think that
> your explanation fits the facts pretty well...

Even with 32MB cache, 30secs should be more than enough.  It's not
like the drive is gonna do random write on those.  It's likely to make
only very few number of strokes over the platter and it really
shouldn't take very long.  I'm yet to see an actual case where a
properly functioning drive timed out flush because the flush itself
took long enough.

> The drive might take a longer time like this when doing error handling
> (sector remapping, etc), but then I would expect to see your remapped
> sector count grow.

Yes, this is a possibility and according to the spec, libata EH should
be retrying flushes a few times before giving up but I'm not sure
whether keeping retrying for several minutes is a good idea either.
Is it?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html