On Fri, Apr 27, 2007 at 11:00:41AM +0200, Mathieu Avila wrote: > Hello all, > > >From what i understand of the GFS1 source code, I/O error are not > managed : when an I/O error happens, either it exits the locking > protocol's cluster (Gulm or CMAN), or sometimes it asserts/panics. > > Anyway, most of the time, the node that got an I/O error must be > rebooted (file system layer is instable) and the device must be checked > and the file system must be fsck'ed. > > Are there any plans for a cleaner management of I/O errors in GFS1, > like, say, remount in R/O mode with -EIO returned to apps, or even > better, advanced features like relocation mechanisms ? Is it planned in > GFS2 ? You've got very close to what you're asking for with the "withdraw" feature which has existed in gfs1 since rhel4. When gfs detects an io error, it does a "withdraw" on that fs, which means shutting it down: returning EIO to anything accessing it, telling other nodes to do journal recovery for it, dropping all global locks that were held, then you can unmount the withdrawn fs. It's mainly about getting the node with the errors out of the way of other nodes so the others can continue. It also allows you to shut down and reboot the node experiencing errors in a controlled fashion. Dave -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster