On Jul 15, 2009 15:19 -0700, Sage Weil wrote: > On Wed, 15 Jul 2009, Andreas Dilger wrote: > > I'm thinking of using simple ASCII key=value pairs to store basic > > layout information like chunk size, stripe count, mirror count, > > RAID type, etc. Some of them may not be applicable/usable by all > > filesystems, but having a handful of "well known" keys and values > > for a common xattr name would at least be better than what we have > > now (which is nothing). > > > > Something like (not necessarily a firm proposal yet): > > > > trusted.common_layout: > > chunk_bytes=65536 > > stripe_count=32 > > mirror_count=3 > > raid_type=1+0 > > > > Is this something you would be interested to pursue? I've also discussed > > this with Panasas, and they had some interest in this as well. Any GPFS > > developers watching? > > This sounds like a good idea to me. I think the main hurdle is going to > be defining a generalized layout description that captures all the full > space of layouts for each file systems, and also translates gracefully > between them. IIRC Lustre, for instance, will stripe over $stripe_count > objects, while Ceph (and Panasas?) will stripe up to some $max_object_size > and then move on to a new set of objects. Or stagger chunk order in > successive stripes, etc. Well, I don't think we can capture all of the details for every filesystem, but I'm hoping we can get some of the main parameters working. Having additional attributes that are more filesystem specific is fine too (to a reasonable extent of course). For parts of the layout that are generated programatically, like the Ceph/Panasas striping order, I don't think that has to be encoded explicitly into the layout xattr, since I'd assume the pattern is always the same between files (e.g. use $stripe_count objects until $max_object_size bytes, then a different set of $stripe_count objects for $max_object_size bytes). That Lustre uses the same $stripe_count objects for the whole file, and it would ignore $max_object_size is below the level of detail that I'm currently interested in. In the reverse direction, I'd assume that Ceph/Panasas would fill in the value for $max_object_size from a default, as if no layout was used. Filesystems are free to ignore parameters they don't like, and/or save them and return them again when asked (probably with a flag that indicates they are not currently in use), basically treating them as an opaque user xattr. This will preserve the settings across an fsX -> fsY -> fsX transfer. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html