On Thu, Sep 11, 2008 at 07:43:18AM +0200, Tobias Oetiker wrote: > > What I am hoping for, is that someone tells me, that in the case of > 'data=journal' the loss would only be the material that is still in > the journal (eg 30 seconds worth of data) and the rest of the fs > would have a fair chance of being recoverd with fsck. > The paper you quoted essentially indicated that ext3's JBD layer checking for error cases sufficiently. It has improved since then, but there are a few places where when I did a quick audit of the code paths, I was able to find a few places where we aren't checking the error returns when calling sync_dirty_buffer(), for example. In general, though, if there is a failure to write to the SSD, it should get detected fairly quickly, at which point the journal will get aborted, which will suspend writes to the filesystem. It may not happen as quickly as we might like, and if you get really unlucky and a singleton write fails and it's one where the error return doesn't get written, you could end up writing garbage to the filesystem on a journal replay. In that worst case scenario, you might end up losing a full inode table block's worth of inodes, but in general, the loss should be the last few minutes worth of data. Fsck has a better than normal chance of recoverying from a busted journal. That being said, it would be wise to monitor the health of the SSD via S.M.A.R.T., since I would suspect that failures of the SSD should be easily predicted by the firmware. On Thu, Sep 11, 2008 at 09:13:21AM +0100, Chris Haynes wrote: > > Is it perhaps the case that, to maximize the integrity of the main > data, one would *want* the journal to have a different failure > pattern? > > That, if there were any doubt about journal integrity, it would be > better (for the integrity of the main file system) to discard the > journal entirely? > > This would suggest the use of a robust hash / cryptographic digest > of the journal contents, stored with it and checked each time the > journal is about to be used. These are quite quick to compute > nowadays. Indeed, this is what ext4 does; there is a checksum (you don't need a cryptographic digest since contrary to most sysadmin's fears, hard drives are *not* malicious sentient beings :-), in each commit record to detect these problems, and if a problem is found, we abort running the journal right then and there. It is possible this change can mean that you will lose more data, not less. If there is a singleton failure writing a single block, early in the journal, aborting the journal means that we don't replay any of the later journal commits, and it could very well be corrupted data block was later rewritten successfully to the journal in a later commit, and in fact, continuing the journal recovery is the right thing to do. On the other hand, if the corrupted datablock was a journal descriptor, aborting the journal commit is the best thing you could do. But this could mean that in theory you might end up losing more than just the last 30 seconds, but more like last couple of minutes worth of data. (Even data which was fsync'ed, since fsync only guarantees that the data was written to some stable storage; fsync makes no guarantees about what might happen if your stable storage, including the journal, fails to store data correctly.) We've talked about changing the journalling code to write a separate checksum for each block, which would allow us to more intelligently recover from a failed checksum in the journal block. It wouldn't be a trivial thing to add, so we haven't added that to date. And this is a relatively unlikely case, which involves an (undetected) single write failure, followed by a crash at just the wrong time, before the journal has a chance to wrap. Also, ext4 is even better than ext3 in terms of checking error returns (although to be honest when I did a quick audit just now I still did find a few places where we should add some error checks; I'll work on getting fixes submitted for both ext3 and ext4). - Ted _______________________________________________ Ext3-users mailing list Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users