Ugh. This is really, really, *really* ugly. If you really want to have hadoop shut down when there are too many errors, it's much better to expose the number of EIO errors via sysfs, and then have some kind of system daemon do the right thing. But actually converting EIO's to file system errors isn't really a good idea. Consider that most of the time, when you get a read error from the disk, if you rewrite that block, all will be will. So taking the entire disk off-line, and setting the errors fs bit won't really help. (a) Until the block is rewritten, the next time you try to read it, you'll get an error, and (b) running fsck will be a waste of time, since it will only scan the metadata blocks, and so the data block will still have an error. I assume you're using hadoopfs as your cluster file system, which has redundancy at the file system level, right? So getting an EIO won't be the end of the world, since you can always read data chunk from a redundant copy, or perform a reed-solomon reconstruction. In fact, disabling the entire file system is the worst thing you can do, since you lose access to the rest of the files, which increases network track to your cluster interconnect, especially if you have to do a R-S reconstruction. (In fact I've recently written up a plan to turn metadata errors into EIO's, without bringing down the entire file system as containing errors, to make the file system more resiliant to I/O errors --- the exact reverse of what you're trying to do.) For data I/O errors, what you in fact what to do is to handle them in userspace, and just have HDFS delete the local copy of the file. The next time you allocate the space and rewrite the block, the disk will do a bad block remap, and you'll be OK. Now, you may want to do different things if the disk has completely disappeared, or has completely died, so this is a case where it would be desirable to get finer grained error reporting from the block I/O layer --- there's a big difference between what you do for an error caused by an unreadable block, and one caused by disk controller bursting into flame. But in general, remounting the file system read-only should be a last-resort thing, and not the first thing you should try doing. Regards, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html