Andreas Dilger wrote:
On May 28, 2006 22:07 -0400, Ric Wheeler wrote:
you could build a
file system on top of a collection of disk partitions/LUN's and then
your inode would could be extended to encode the partition number and
the internal mapping. You could even harden the block groups to the
point that fsck could heal one group while the file system was (mostly?)
online backed up by the rest of the block groups...
This is one thing that we have been thinking of for ext3. Instead of a
filesystem-wide "error" bit we could move this per-group to only mark
the block or inode bitmaps in error if they have a checksum failure.
This would prevent allocations from that group to avoid further potential
corruption of the filesystem metadata.
Once an error is detected then a filesystem service thread or a userspace
helper would walk the inode table (starting in the current group, which
is most likely to hold the relevant data) recreating the respective bitmap
table and keeping a "valid bit" bitmap as well. Once all of the bits
in the bitmap are marked valid then we can start using this group again.
That is a neat idea - would you lose complete access to the impacted
group, or have you thought about "best effort" read-only while under repair?
One thing that has worked very well for us is that we keep a digital
signature of each user object (MD5, SHAX hash, etc) so we can validate
that what we wrote is what got read back. This also provides a very
powerful sanity check after getting hit by failing media or severe file
system corruption since what ever we do manage to salvage (which might
not be all files) can be validated.
As an archival (write once, read infrequently) storage device, this
works pretty well for us since the signature does not need to constantly
recomputed on each write/append.
For general purpose read/write work loads, I wonder if it would make
sense to compute and store such a checksum or signature on close (say in
an extended attribute)? It might be useful to use another of those
special attributes (like immutable attribute) to indicate that this file
is important enough to digitally sign on close.
Regards,
Ric
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html