On Fri, Apr 22, 2011 at 8:44 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > Few things: > > - I think the xattr approach is always going to be faster. xattrs are > stored adjacent to the inode in the btree, while creating intervening > directories means a new inode is allocated, seeked to, and loaded, and > _then_ the directory content is looked up in another part of the btree > before the final inode is located. For each level you add two seeks > (although in the common case, at least, those inodes will be close by). Fair enough. > - You can't make intervening directories both rare (long) and useful for > prefix search (short) unless you really think people will be searching on > 100+ character prefixes. Earlier I suggested making it configurable, so that we could have it tuned to a short value on the cluster backing rgw, but a long value elsewhere. > - Hash collisions will be rare for all but our test cases. If we only > hash for long filenames (say, 200+ characters) that means someone has to > find a SHA-256 collision (has anybody??). And even then they only turn 1 > stat into 2. Only if someone can generate an arbitrary number of inputs > that hash to the same value do they get anywhere. I don't think that's > something we should worry about. If someone breaks a crypto hash there > are much bigger things to worry about. (Even if we are super paranoid, > then just sha(name + sha(name)). A good guide to choosing a crypto hash: http://valerieaurora.org/hash.html > - We can easily wrap the non-fast past with a mutex to avoid the races > (because, again, collisions are vanishingly rare except in our test > cases). I believe that all these operations are already done under the PG lock. So there are no race conditions in normal operation. TV is talking about a case where there has been a crash and we're resuming from some intermediate state. Based on our earlier discussion, perhaps this is not a problem on btrfs because of the snapshotting mechanic? cheers, Colin -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html