Stat cache in .git/index hinders syncing of repositories

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I am using unison to sync home directories across multiple machines.
This includes a fair number of git repositories and works very well.
Unison recently acquired a new feature that allows to treat selected
subdirectories (like .git) atomically.  This makes the syncing perfectly
safe.

Some people say that one should use git itself to sync git working
directories, but IMHO these people oversee the difference between
collaboration (using git) and being able to continue one’s own
unfinished work on a different machine, including uncommitted files,
stashes, and - if it has to be - in the middle of a merge.  Moreover, it
is simpler not to have to treat git repositories specially when syncing.
Syncing git repositories is thus clearly useful.

However, there is one problem with syncing git repositories, that has
been noticed by multiple people [1]: The file .git/index contains not
only the “git index”, but also a cache of stat-data of the files in the
working directory.  Some file synchronizers are able to sync mtimes, but
syncing ctimes would be bizarre (if it is even possible).

So, say that machines A and B are synced.  A new git repository appears
on machine A.  The synchronizer is run which results in copying all the
files of the new repo verbatim to machine B.  Note that now on machine
B the cache inside the file .git/index contains invalid stat
information.  So when "git status" is run on B .git/index gets
rewritten, and the next sync operation copies it back to A, where again
it is rewritten even by something as harmless as "git status".  And so
on, and so forth...

In my opinion the root of this ping-pong problem is that .git/index
mixes information about the status of the repository (=what has been
staged) that should be synced with a cache of machine-specific
filesystem metadata.

I am not an expert of git-internals, but perhaps it would be a good idea
to move the cache into a separate file that could be put on a "ignore"
list for synchronizers?  It seems to me that this has been already
proposed in a different context [2], and I would not be surprised if
factoring out the cache had other beneficial effects.

If it is not feasible to separate the cache, perhaps another possibility
would be to add a new possible value for core.checkStat that would
disable stat structure checking except for file sizes?

As a workaround for now, I exclude .git/index from syncing.  This seems
to work quite well, but I would be scared to sync unfinished merges like
this.

Thanks
Christoph

[1] https://stackoverflow.com/questions/12126247/why-does-git-index-change-when-i-havent-done-anything-to-my-repository
[2] https://www.mail-archive.com/git@xxxxxxxxxxxxxxx/msg48065.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux