Re: getting I/O errors in super_written()...any ideas what would cause this?

Ric Wheeler <rwheeler@xxxxxxxxxx> · Mon, 03 Dec 2012 15:52:55 -0500

On 12/03/2012 03:44 PM, Chris Friesen wrote:
On 12/03/2012 02:22 PM, Ric Wheeler wrote:
On 11/28/2012 03:29 PM, Chris Friesen wrote:
On 11/28/2012 02:27 PM, Mathias Burén wrote:

The drives look healthy, but am I reading that right? More than 10
self tests per hour?

Yeah....we cranked it up to try and increase how frequently we see the
problem.

From what I understand normally it runs once a day.

Chris

Did the vendor suggest to you that running a self test on an active
drive would be OK? I would expect errors in this case - specifically
time outs....

I'm not the main developer in that area, but from what I understand the code 
has been like this for ages.  (It's entirely possible we've been lucky up till 
now since we support limited hardware types.)

The fact that you'd expect time outs is interesting--is that from the delay 
switching from doing the self-test to doing the actual request?

Is the expectation that the OS should not be sending any other commands to the 
disk while doing the self-test?

I was recently looking at the SCSI spec trying to learn a bit about this issue 
and the section on background self-test (spc-4, section 5.15.4.3) seems to 
indicate that a READ or WRITE command should cause the background self-test to 
be aborted and the command to be processed within 2 seconds.  In our case it 
doesn't seem to be aborting (at least it shows as "Completed" in smartctl)--is 
this expected?

Thanks,
Chris

I jumped into this thread late - can you repost detail on the specific drive and 
HBA used here? In any case, it sounds like this is a better topic for the 
linux-scsi or linux-ide list where most of the low level storage people lurk :)

Ric

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html