Re: [PATCH] ext4: Set file system to read-only by I/O error threshold

"Ted Ts'o" <tytso@xxxxxxx> · Tue, 21 Jun 2011 10:48:56 -0400

Ugh.  This is really, really, *really* ugly.  If you really want to
have hadoop shut down when there are too many errors, it's much better
to expose the number of EIO errors via sysfs, and then have some kind
of system daemon do the right thing.

But actually converting EIO's to file system errors isn't really a
good idea.  Consider that most of the time, when you get a read error
from the disk, if you rewrite that block, all will be will.  So taking
the entire disk off-line, and setting the errors fs bit won't really
help.  (a) Until the block is rewritten, the next time you try to read
it, you'll get an error, and (b) running fsck will be a waste of time,
since it will only scan the metadata blocks, and so the data block
will still have an error.

I assume you're using hadoopfs as your cluster file system, which has
redundancy at the file system level, right?  So getting an EIO won't
be the end of the world, since you can always read data chunk from a
redundant copy, or perform a reed-solomon reconstruction.  In fact,
disabling the entire file system is the worst thing you can do, since
you lose access to the rest of the files, which increases network
track to your cluster interconnect, especially if you have to do a R-S
reconstruction.

(In fact I've recently written up a plan to turn metadata errors into
EIO's, without bringing down the entire file system as containing
errors, to make the file system more resiliant to I/O errors --- the
exact reverse of what you're trying to do.)

For data I/O errors, what you in fact what to do is to handle them in
userspace, and just have HDFS delete the local copy of the file.  The
next time you allocate the space and rewrite the block, the disk will
do a bad block remap, and you'll be OK.

Now, you may want to do different things if the disk has completely
disappeared, or has completely died, so this is a case where it would
be desirable to get finer grained error reporting from the block I/O
layer --- there's a big difference between what you do for an error
caused by an unreadable block, and one caused by disk controller
bursting into flame.  But in general, remounting the file system
read-only should be a last-resort thing, and not the first thing you
should try doing.

Regards,

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html