Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/07, Thomas Gummerer wrote:
> read_index_from() takes a path argument for the location of the index
> file.  For reading the shared index in split index mode however it just
> ignores that path argument, and reads it from the gitdir of the current
> repository.
> 
> This works as long as an index in the_repository is read.  Once that
> changes, such as when we read the index of a submodule, or of a
> different working tree than the current one, the gitdir of
> the_repository will no longer contain the appropriate shared index,
> and git will fail to read it.
> 
> For example t3007-ls-files-recurse-submodules.sh was broken with
> GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository
> object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also
> broken in a similar manner, probably by introducing struct repository
> there, although I didn't track down the exact commit for that.
> 
> be489d02d2 ("revision.c: --indexed-objects add objects from all
> worktrees", 2017-08-23) breaks with split index mode in a similar
> manner, not erroring out when it can't read the index, but instead
> carrying on with pruning, without taking the index of the worktree into
> account.
> 
> Fix this by passing an additional gitdir parameter to read_index_from,
> to indicate where it should look for and read the shared index from.
> 
> read_cache_from() defaults to using the gitdir of the_repository.  As it
> is mostly a convenience macro, having to pass get_git_dir() for every
> call seems overkill, and if necessary users can have more control by
> using read_index_from().

I'm not saying we need to change this again but I got to thinking about
what the root cause for this bug is and I think it's a design flaw in
how split index is implemented.  IIUC Split index is an index extension
that can be enabled to limit the size of the index file that is written
when making changes to the index.  It breaks the index into two pieces,
index (which contains only changes) and sharedindex.XXXXX (which
contains unchanged information) where 'XXXXX' is a value found in the
index file.  If we don't do anything fancy then these two files live
next to one another in a repository's git directory at $GIT_DIR/index
and $GIT_DIR/sharedindex.XXXXX.  This seems to work all well and fine
except that this isn't always the case and the read_index_from function
takes this into account by enabling a caller to specify a path to where
the index file is located.  We can do this by specifying the index file
we want to use by setting GIT_INDEX_FILE.

Being able to specify an index file via the environment is a feature
that has existed for a very long time (one that I personally think makes
things difficult because of things like additions like the sharedindex)
and I don't think was taken into account when introducing the split
index extension.  In this case if i were to specify a location of an
index file in my home directory '~/index' and be using the split index
feature then the corresponding sharedindex file would live in my
repository's git directory '~/project/.git/sharedindex.XXXXX'.  So the
sharedindex file is always located relative to the project's git
directory and not the index file itself, which is kind of confusing.
Maybe a better design would be to have the sharedindex file located
relative to the index file.

Anyway, some food for thought.

-- 
Brandon Williams



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux