On Mon, Jan 28, 2013 at 5:40 PM, Theodore Ts'o <tytso@xxxxxxx> wrote: > On Mon, Jan 28, 2013 at 04:20:11PM -0800, Darrick J. Wong wrote: >> On Mon, Jan 28, 2013 at 03:27:38PM -0800, David Lang wrote: >> > The situation I'm thinking of is when dealing with VMs, you make a >> > filesystem image once and clone it multiple times. Won't that end up >> > with the same UUID in the superblock? >> >> Yes, but one ought to be able to change the UUID a la tune2fs -U. Even >> still... so long as the VM images have a different UUID than the fs that they >> live on, it ought to be fine. > > ... and this is something most system administrators should be > familiar with. For example, it's one of those things that Norton > Ghost when makes file system image copes (the equivalent of "tune2fs > -U random /dev/XXX") Hmm, maybe I missed something but it does not seem like a good idea to use the volume UID itself to generate unique-per-volume metadata hashes, if users expect to be able to change it. All the metadata hashes would need to be changed. Anyway, our primary line of attack on this problem is not unique hashes, but actually knowing which blocks are in files and which are not. Before (a hypothetical) Tux3 fsck repair would be so bold as to reattach some lost metadata to the place it thinks it belongs, all of the following would need to be satisfied: * The lost metadata subtree is completely detached from the filesystem tree. In other words, it cannot possibly be the contents of some valid file already belonging to the filesystem. I believe this addresses the concern of David Lang at the head of this thread. * The filesystem tree is incomplete. Somwhere in it Tux3 fsck has discovered a hole that needs to be filled. * The lost metadata subree is complete and consistent, except for not being attached to the filesystem tree. * The lost metadata subtree that was found matches a hole where metadata is missing, according to its "uptags", which specify at least the low order bits of the inode the metadata belongs to and the offset at which it belongs. * Tux3 fsck asked the user if this lost metadata (describing it in some reasonable way) should be attached to some particular filesystem object that appears to be incomplete. Alternatively, the lost subtree may be attached to the traditional "lost+found" directory, though we are able to be somewhat more specific about where the subtree might originally have belonged, and can name the lost+found object accordingly. Additionally, Tux3 fsck might consider the following: * If the allocation bitmaps appear to be undamaged, but some or all of a lost filesystem tree is marked as free space, then the subtree is most likely free space and no attempt should be made to attach it to anything. Thanks for your comments. I look forward to further review as things progress. One thing to consider: this all gets much more interesting when versioning arrives. For shared tree snapshotting filesystem designs, this must get very interesting indeed, to the point where even contemplating the corner makes me shudder. But even with versioning, Tux3 still upholds the single-reference rule, therefore our fsck problem will continue to look a lot more like Ext4 than like Btrfs or ZFS. Which suggests some great opportunities for unabashed imitation. Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html