On Thu, Apr 21, 2011 at 2:09 PM, Colin McCabe <cmccabe@xxxxxxxxxxxxxx> wrote: > On Thu, Apr 21, 2011 at 1:03 PM, Gregory Farnum > <gregory.farnum@xxxxxxxxxxxxx> wrote: >> I really don't see how pushing the naming complexity into the local filesystem, >> where it adds lots of otherwise-useless inodes and dentries, is going to help us. > > Here is a quick summary of how the TV's proposal would help us. > 1. it avoids collisions entirely > 2. You don't ever have do an extra xattr lookup, no matter how short > or long the object name is. Yeah, but you read more directories. Note that btrfs stores the xattrs on the directories, so reading those xattrs will have a lower IO impact than traversing directories recursively. > > My add-on proposal helps us: > 3. get reasonable prefix search performance (with those supposedly > "useless" dentries) > >> I like what Yehuda has here for its relative simplicity -- though I think we should just up >> the hash size enough that we don't need to handle collisions, > > Personally, I think the xattr proposal is more complex. I guess that > is a matter of taste. > > No matter how big your hash table will be, there are still collisions! > That is the nature of hashing. And since the code is open source, it's > pretty easy for an attacker to read the source and then create two > objects whose names collide. Sure there will be, and the code should handle it. With a good hashing scheme having a collision will be pretty rare. > > So far, the only disadvantage that has been pointed out to TV's scheme > is that it creates extra dentries. But those extra dentries only > affect long object names, not the ones that (for example) the Ceph FS > creates. Also, when long object names occur in S3, they don't tend to > come out of the blue. They come about because the organization has a > sort of directory structure like this: > > foocorp/business_data/business_reports/year_2008/input/foo > foocorp/business_data/business_reports/year_2008/input/bar > > Of course we "know" that there are no such things as directories in > S3. But people like to structure their object names as if there were. > In cases like that, TV's scheme only incurs the cost of creating the > extra dentries once per long prefix. > As I said above, for most cases reading xattrs should be more efficient. Yehuda -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html