On the many files problem

Yannick Gingras <ygingras@xxxxxxxxxxxx> · Sat, 29 Dec 2007 13:22:29 -0500

Greetings Git hackers,

No doubt, you guys must have discussed this problem before but I will
pretend that I can't find the relevant threads in the archive because
Marc's search is kind of crude.

I'm coding an application that will potentially store quite a bunch of
files in the same directory so I wondered how I should do it.  I tried
a few different files systems and I tried path hashing, that is,
storing the file that hashes to d3b07384d113 in d/d3/d3b07384d113.  As
far as I can tell, that's what Git does.  It turned out to be slower
than anything except ext3 without dir_index.  You can see my results
and the benchmarking code that I used here:

  http://ygingras.net/b/2007/12/too-many-files:-reiser-fs-vs-hashed-paths

Quick like that, I would be tempted to say that hashing paths always
makes things slower but the Git development team includes people with
really intimate knowledge of several file system implementations so
I'm tempted to say that you guys know something that I don't.

Can you describe how you hash the paths and what trick is done to
ensure fast creating and access to the subdirectories?  Is path
hashing generally faster or are you trying to avoid problems for
people using git on baroque file systems?

Best regards, 

-- 
Yannick Gingras
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html