Re: URE, link resets, user hostile defaults

Hannes Reinecke <hare@xxxxxxx> · Mon, 4 Jul 2016 08:00:43 +0200

On 07/01/2016 10:43 PM, Chris Murphy wrote:
> Here's a fun one of these I just got off the Fedora users mailing list
> with a laptop drive that's apparently hanging on *write*. This I would
> not expect to take a long time for a drive to figure out, but... there
> are more resets than there are write errors, and in fact there's no
> discrete write error from the drive, all we know is the failed command
> is a WRITE command.
> 
> What seems to happen is, everything in the queue gets obliterated in
> the reset, and when ext4 finds out everything failed, not just one
> write, it barfs and goes read only.
> 
> http://pastebin.com/3JAL297z
> 
> How might this turn out differently if the drive reported a single
> discrete write error? I don't know how any file system tolerates this
> because it's so rare. Would ext4 just try to write again? Would it try
> to write to the same sector or another one? Or maybe the write finally
> succeeds by resulting in a remap (?) But this sure is dang slow to
> recover from a bad write. I don't understand the engineering rational
> for this. Maybe it's a firmware bug?
> 
> 
Could be. At the very least it's an issue with EH interaction.
ATA COMRESET fails, ie libata EH fails to reset the SATA link.
Which is pretty terminal, so the device is set to offline afterwards.

This is most definitely an ATA issue, and doesn't really belong in this
context.
(Have you reported it on linux-ide?)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@xxxxxxx			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html