Duy Nguyen <pclouds@xxxxxxxxx> writes: > The major cost of writing an index is the SHA-1 hashing. The bigger > the written part is, the higher cost we pay. So what if we write > stat-only data to a separate file? Think of it as an index extension, > only it stays outside the index. On webkit with 182k files, the stat > data size would be about 6MB (its index v4 is 15M for comparison). But > with stat-only we could employ some cheap but efficient compressing, > sd_dev, sd_uid and sd_gid are likely the same for every entry. And we > could store the stat data of updated entries only. So I'm hoping to > get that 6MB down to a few hundred KBs. That makes hashing lightning > fast. It is perfectly OK to store your verbose stat data after deflating it in the index as an index extension, so "storing 6MB that can be compressed efficiently without compressing is dumb" applies whether the result is stored in the index or in a separate file, I would think. Having said that, I do not think there is a fundamental reason why the stat data has to live inside the same index file. A separate file is just fine, as long as you can reliably detect that they went out of sync for whatever reason (e.g. "the index proper updated, a stale stat file left beind"), and storing the trailer checksum from the corresponding index in this new file is an obvious and good solution. I am not sure if that should be called index.stat, though. It is more about untracked files. The stat data for cached paths are in the index proper, so what you are adding is not what we would call "stat info" when we talk about the index. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html