On Mon, Aug 28, 2006 at 03:04:26PM -0400, Sev Binello wrote: > Can anyone tell us what the expected behavior is, > in the event that ext3 loses total contact with the storage system ? > > We have found that the file system is put into read only mode, > it is then found to contain errors, and requires an fsck. > Sometimes the fsck finds numerous (some serious looking) errors, > and that running without fsck doesn't seem like a safe option. > > We are trying to understand why exactly this is. > Why do we get errors ? Why serious ones ? > The filesystem should go read-only when you try to modify it. HOWEVER, the problem comes when connectivity is restored. When an attempt to modify the filesystem fails, the journal is aborted and an I/O is returned. However, there may be modified blocks left hanging about in the buffer cache before the kernel realized that connectivity has been lost, and what we need to do is to make sure that all dirty blocks in the buffer cache and page cache are dropped. Basically, if I'm right, this is a bug, which we need to fix. That patch would require flushing all modified buffers and page cache pages when the filesystem goes read-only. The modified buffers is the more important thing, since that's what causes the filesystem corruption, although for correctness's sake we should be flushing any modified page cache pages as well. I don't have time to code this right now, but I'll try to get a patch out to relatively soonish, if you're willing to try it to see if it addresses your observed problem. - Ted _______________________________________________ Ext3-users mailing list Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users