Re: Stat cache in .git/index hinders syncing of repositories

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Johannes Schindelin wrote:
>
> On Sat, 18 Jan 2020, Christoph Groth wrote:
>
> > OK, I see.  But please consider (one day) to split up the index file
> > to separate the local stat cache from the globally valid data.
>
> I am sure that this has been considered even before Git was publicly
> announced,

I would be very interested to hear the rationale for keeping the
information about what is staged and the stat cache together in the same
file.  I, or someone else, might actually work on a patch one day, but
before starting, it would be good to understand the reasoning behind the
current design.

> and I would wager a guess that it was determined that it would be
> better to keep all of Git's private data in one place.

My point is that it’s not just private data: When I excluded .git/index
from synchronization, staging files for a commit was no longer
synchronized.

> > (By the way, even after 12 years of using Git intensely I am
> > confused about what actually is the index.  I believed that it is
> > the "staging area", like in "git-add - Add file contents to the
> > index".  But then the .git/index file reflects all the tracked
> > files, and not just staged ones.  This usage is also reflected by
> > the command "git update-index".)
>
> The concept of the Git index is slightly different from what is
> actually stored inside `.git/index`. You should consider the latter to
> be an implementation detail that is of concern only if you want to
> work on internals. Otherwise the description of the index as a staging
> area is a pretty good image.

To me, it does not seem to be a mere implementation detail.  For example
the command ’git update-index --refresh’ is part of the "public API" and
its action is to update the stat cache.  It does not modify what is
staged or not.

> > Still, this is a workaround, and the price is reduced robustness of
> > file modification detection.
>
> You misunderstand how Git detects whether a file is modified or not.
>
> A file is re-hashed if its mtime is newer than, _or equal to_, the
> mtime of `.git/index`.

You must mean "the mtime in ’.git/index’", but OK, I see.  Makes sense
of course.  So setting core.trustctime to false and core.checkstat to
minimal only means that some avoidable rehashings may be made.  But this
would require two modifications of a file in the same second, without
a change to the file size.

> In general, I am not sure that you are using the right tool for
> synchronizing. If you cannot guarantee that a snapshot of the
> directory is copied, you will always run the risk of inconsistent
> data, which is worse than not having a backup at all: at least without
> a backup you do not have a false sense of security.

I do not understand what makes you think so.

Unison is very robust software, I never had any problems with it and
never heard of anyone having any.  Moreover, as I noted in the opening
message of this thread, it recently gained an option to treat chosen
directories as atomic.  I’m using this for ".git" subdirectories.

Christoph

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux