On May 28, 2006 22:07 -0400, Ric Wheeler wrote: > I think that the namespace needs to present a normal file system set of > operations - support for hardlinks, no magic directories, etc. so that > applications don't need to load balance (or even be aware) of the > sub-units that provide storage. If we removed that requirement, we > would be back to today's collection of various file systems mounted on a > single host. > > I know that lustre aggregates full file systems Yes - we have a metadata-only filesystem which exports the inode numbers and namespace, and then separate (essentially private) filesystems that store all of the data. The object store filesystems do not export any namespace that is visible to userspace. > you could build a > file system on top of a collection of disk partitions/LUN's and then > your inode would could be extended to encode the partition number and > the internal mapping. You could even harden the block groups to the > point that fsck could heal one group while the file system was (mostly?) > online backed up by the rest of the block groups... This is one thing that we have been thinking of for ext3. Instead of a filesystem-wide "error" bit we could move this per-group to only mark the block or inode bitmaps in error if they have a checksum failure. This would prevent allocations from that group to avoid further potential corruption of the filesystem metadata. Once an error is detected then a filesystem service thread or a userspace helper would walk the inode table (starting in the current group, which is most likely to hold the relevant data) recreating the respective bitmap table and keeping a "valid bit" bitmap as well. Once all of the bits in the bitmap are marked valid then we can start using this group again. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html