Re: Store refreshed stat info in a separate file?

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 18 Apr 2014 10:43:50 -0700

Duy Nguyen <pclouds@xxxxxxxxx> writes:

> The major cost of writing an index is the SHA-1 hashing. The bigger
> the written part is, the higher cost we pay. So what if we write
> stat-only data to a separate file? Think of it as an index extension,
> only it stays outside the index. On webkit with 182k files, the stat
> data size would be about 6MB (its index v4 is 15M for comparison). But
> with stat-only we could employ some cheap but efficient compressing,
> sd_dev, sd_uid and sd_gid are likely the same for every entry. And we
> could store the stat data of updated entries only. So I'm hoping to
> get that 6MB down to a few hundred KBs. That makes hashing lightning
> fast.

It is perfectly OK to store your verbose stat data after deflating
it in the index as an index extension, so "storing 6MB that can be
compressed efficiently without compressing is dumb" applies whether
the result is stored in the index or in a separate file, I would
think.

Having said that, I do not think there is a fundamental reason why
the stat data has to live inside the same index file.  A separate
file is just fine, as long as you can reliably detect that they went
out of sync for whatever reason (e.g. "the index proper updated, a
stale stat file left beind"), and storing the trailer checksum from
the corresponding index in this new file is an obvious and good
solution.

I am not sure if that should be called index.stat, though.  It is
more about untracked files.  The stat data for cached paths are in
the index proper, so what you are adding is not what we would call
"stat info" when we talk about the index.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html