On Mon, 31 Dec 2007, Yannick Gingras wrote: > > This is really interesting and I would not have suspected it. But it > begs the question: why does Git use the base-16 hash instead of the > base-64 hash? Because I was stupid, and I'ma lot more used to hex numbers than to base-64. Also, the way the original encoding worked, the SHA1 of the object was actually the SHA1 of the *compressed* object, so you could verify the integrity of the object writing by just doing a sha1sum .git/objects/4f/a7491b073032b57c7fcf28c9222a5fa7b3a6b9 and it would return 4fa7491b073032b57c7fcf28c9222a5fa7b3a6b9 if everything was good. That was a nice bonus in the first few days of git development, when it all was a set of very low-level object routines hung together with prayers and duct-tape. So consider it historical. It wasn't worth fixing, since it became obvious that the real fix would never be to try to make the individual files or filenames smaller. > > (Also, a "readdir() + stat()" loop will often get *much* worse access > > patterns if you've mixed deletions and creations) > > This is something that will be interesting to benchmark later on. So, > an application with a lot of turnaround, say a mail server, should > delete and re-create the directories from time to time? I assume this > is specific to some file system types. This is an issue only for certain filesystems, and it's also an issue only for certain access patterns. A mail server, for example, will seldom *scan* the directory. It will just open individual files by name. So it won't be hit by the "readdir+stat" issue, unless you actually do a "ls -l". (There are exceptions. Some mailbox formats use a file per email in a directory. And yes, they tend to suck from a performance angle). And you can avoid it. For example, on most unixish filesystems, you can get better IO access patterns by doing the readdir() into an array, then sorting it by inode number, and then doing the stat() in that order: that *often* (but not always - there's no guarantee what the inode number actually means) gives you better disk access patterns. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html