Re: Stat cache in .git/index hinders syncing of repositories

Christoph Groth <christoph@xxxxxxxxxxxxxx> · Fri, 24 Jan 2020 10:16:18 +0100

brian m. carlson wrote:
> On 2020-01-20 at 23:53:22, Christoph Groth wrote:

> > My point is that it’s not just private data: When I excluded
> > .git/index from synchronization, staging files for a commit was no
> > longer synchronized.
>
> (...)
>
> Storing all of this data in one file means that only one file need be
> mapped into memory and rewritten.  Git writes to the index by
> atomically creating a lock file along side of it and writing the new
> contents into it, and then doing an atomic replace.  This approach
> wouldn't be possible with multiple files, and any update to it
> wouldn't be atomic.

Thanks a lot for the explanation.  To me, it still seems less
satisfying, from a design point of view, to mix state (=what changes
have been staged) with an ephemeral cache that is specific to
a particular file system.  Without having thought deeply about it,
I have the impression that it wouldn’t matter if the stat cache and the
“staging state” of the repository would be atomic each on their own.

But I understand now that all of this hardly matters in practice (see
below), so I’m not motivated to work on this, and probably no one else
is. :-)

> However, having said that, nobody has provided a compelling case for
> using multiple files for storing different types of working tree
> state.  The existing options are available for cases like yours and
> others', and they work.  Since there are clear benefits to the current
> model, including simplicity and robustness, and few downsides, nobody
> has decided to change it.

Indeed, I do see hardly any disadvantages of globally setting

	trustctime = false
	checkstat = minimal

as I do now.  In fact, I wonder what is the purpose of caching the
subsecond part of mtime and the ctime in the first place.  Perhaps it
matters for scripted use of git where several operations can occur in
the same second, but even then only changes that keep file sizes
constant would be affected.

> I should add that even if, for some reason, we did add support for
> splitting this data out, I'm not sure if we'd support syncing only
> part of the repository state and blowing away other state.  We don't
> really support that now (other than through tools like fetch and
> clone) and I don't think we'd want to encourage that behavior in the
> future.

The stat cache file would not be really part of the state of the
repository, since deleting it would not change anything, but only slow
down the next operation.  (That’s at least my understanding currently,
perhaps I’m still overseeing something.)

Brian, Johannes, Junio, thanks a lot for taking the time to clarify this
issue.

Christoph
Attachment:
signature.asc

Description: PGP signature