Re: URE, link resets, user hostile defaults

Chris Murphy <lists@xxxxxxxxxxxxxxxxx> · Fri, 1 Jul 2016 14:43:37 -0600

Here's a fun one of these I just got off the Fedora users mailing list
with a laptop drive that's apparently hanging on *write*. This I would
not expect to take a long time for a drive to figure out, but... there
are more resets than there are write errors, and in fact there's no
discrete write error from the drive, all we know is the failed command
is a WRITE command.

What seems to happen is, everything in the queue gets obliterated in
the reset, and when ext4 finds out everything failed, not just one
write, it barfs and goes read only.

http://pastebin.com/3JAL297z

How might this turn out differently if the drive reported a single
discrete write error? I don't know how any file system tolerates this
because it's so rare. Would ext4 just try to write again? Would it try
to write to the same sector or another one? Or maybe the write finally
succeeds by resulting in a remap (?) But this sure is dang slow to
recover from a bad write. I don't understand the engineering rational
for this. Maybe it's a firmware bug?

Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html