On Feb 16, 2007 11:25 -0500, Sev Binello wrote: > Theodore Tso wrote: > >On Mon, Aug 28, 2006 at 03:04:26PM -0400, Sev Binello wrote: > >>Can anyone tell us what the expected behavior is, > >>in the event that ext3 loses total contact with the storage system ? > >> > >>We have found that the file system is put into read only mode, > >>it is then found to contain errors, and requires an fsck. > >>Sometimes the fsck finds numerous (some serious looking) errors, > >>and that running without fsck doesn't seem like a safe option. > >> > >>We are trying to understand why exactly this is. > >>Why do we get errors ? Why serious ones ? > > > >The filesystem should go read-only when you try to modify it. > >HOWEVER, the problem comes when connectivity is restored. When an > >attempt to modify the filesystem fails, the journal is aborted and an > >I/O is returned. However, there may be modified blocks left hanging > >about in the buffer cache before the kernel realized that connectivity > >has been lost, and what we need to do is to make sure that all dirty > >blocks in the buffer cache and page cache are dropped. In fact, there are a number of other places as well, like the elevator and IDE/SCSI/LVM layers that can be hung up on timeouts and retries for a long time. It would be nice if the filesystem could abort all pending IOs in the underlying layers > >Basically, if I'm right, this is a bug, which we need to fix. That > >patch would require flushing all modified buffers and page cache pages > >when the filesystem goes read-only. The modified buffers is the more > >important thing, since that's what causes the filesystem corruption, > >although for correctness's sake we should be flushing any modified > >page cache pages as well. I don't have time to code this right now, > >but I'll try to get a patch out to relatively soonish, if you're > >willing to try it to see if it addresses your observed problem. We talked at one time of marking the block device via set_device_ro(). That would prevent any of the blocks to be flushed out by the block layer. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. _______________________________________________ Ext3-users mailing list Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users