Yannick Gingras <ygingras@xxxxxxxxxxxx> writes: > Greetings Git hackers, > > No doubt, you guys must have discussed this problem before but I will > pretend that I can't find the relevant threads in the archive because > Marc's search is kind of crude. > > I'm coding an application that will potentially store quite a bunch of > files in the same directory so I wondered how I should do it. I tried > a few different files systems and I tried path hashing, that is, > storing the file that hashes to d3b07384d113 in d/d3/d3b07384d113. As > far as I can tell, that's what Git does. It turned out to be slower > than anything except ext3 without dir_index. We hash like d3/b07384d113, but your understanding of we do is more or less right. If we never introduced packed object storage, this issue may have mattered and we might have looked into it further to improve the loose object access performance. But in reality, no sane git user would keep millions of loose objects unpacked. And changing the layout would mean a backward incompatible change for dumb transport clients. There is practically no upside and are downsides to change it now. Traditionally, avoiding large directories when dealing with a large number of files by path hashing was a tried and proven wisdom in many applications (e.g. web proxies, news servers). Newer filesystems do have tricks to let you quickly access a large number of files in a single directory, and that lessens the need for the applications to play path hashing games. That is a good thing, but if that trick makes the traditional way of dealing with a large number of files _too costly_, it may be striking the balance at a wrong point. That is favoring newly written applications that assume that large directories are Ok (or ones written by people who do not know the historical behaviour of filesystems), by punishing existing practices too heavily. The person who is guilty of introducing the hashed loose object store is intimately familiar with Linux. I do not speak for him, but if I have to guess, the reason he originally chose the path hashing was because he just followed the tradition, and he did not want to make the system too dependent on Linux or a particular feature of underlying filesystems. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html