Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/17, Duy Nguyen wrote:
> On Wed, Jan 17, 2018 at 4:42 AM, Brandon Williams <bmwill@xxxxxxxxxx> wrote:
> > On 01/07, Thomas Gummerer wrote:
> >> read_index_from() takes a path argument for the location of the index
> >> file.  For reading the shared index in split index mode however it just
> >> ignores that path argument, and reads it from the gitdir of the current
> >> repository.
> >>
> >> This works as long as an index in the_repository is read.  Once that
> >> changes, such as when we read the index of a submodule, or of a
> >> different working tree than the current one, the gitdir of
> >> the_repository will no longer contain the appropriate shared index,
> >> and git will fail to read it.
> >>
> >> For example t3007-ls-files-recurse-submodules.sh was broken with
> >> GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository
> >> object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also
> >> broken in a similar manner, probably by introducing struct repository
> >> there, although I didn't track down the exact commit for that.
> >>
> >> be489d02d2 ("revision.c: --indexed-objects add objects from all
> >> worktrees", 2017-08-23) breaks with split index mode in a similar
> >> manner, not erroring out when it can't read the index, but instead
> >> carrying on with pruning, without taking the index of the worktree into
> >> account.
> >>
> >> Fix this by passing an additional gitdir parameter to read_index_from,
> >> to indicate where it should look for and read the shared index from.
> >>
> >> read_cache_from() defaults to using the gitdir of the_repository.  As it
> >> is mostly a convenience macro, having to pass get_git_dir() for every
> >> call seems overkill, and if necessary users can have more control by
> >> using read_index_from().
> >
> > I'm not saying we need to change this again but I got to thinking about
> > what the root cause for this bug is and I think it's a design flaw in
> > how split index is implemented.  IIUC Split index is an index extension
> > that can be enabled to limit the size of the index file that is written
> > when making changes to the index.  It breaks the index into two pieces,
> > index (which contains only changes) and sharedindex.XXXXX (which
> > contains unchanged information) where 'XXXXX' is a value found in the
> > index file.  If we don't do anything fancy then these two files live
> > next to one another in a repository's git directory at $GIT_DIR/index
> > and $GIT_DIR/sharedindex.XXXXX.  This seems to work all well and fine
> > except that this isn't always the case and the read_index_from function
> > takes this into account by enabling a caller to specify a path to where
> > the index file is located.  We can do this by specifying the index file
> > we want to use by setting GIT_INDEX_FILE.
> >
> > Being able to specify an index file via the environment is a feature
> > that has existed for a very long time (one that I personally think makes
> > things difficult because of things like additions like the sharedindex)
> > and I don't think was taken into account when introducing the split
> > index extension.
> 
> It was (partly because I did use GIT_INDEX_FILE on occasions).
> 
> > In this case if i were to specify a location of an
> > index file in my home directory '~/index' and be using the split index
> > feature then the corresponding sharedindex file would live in my
> > repository's git directory '~/project/.git/sharedindex.XXXXX'.  So the
> > sharedindex file is always located relative to the project's git
> > directory and not the index file itself, which is kind of confusing.
> > Maybe a better design would be to have the sharedindex file located
> > relative to the index file.
> 
> That adds more problems. Now when you move the index file around you
> have to move the shared index file too (think about atomic rename
> which we use in plenty of places, we can't achieve that by moving two
> files). A new dependency to $GIT_DIR is not that confusing to me, the
> index file is useless anyway if you don't have access to
> $GIT_DIR/objects. There was always the option to _not_ split the index
> when $GIT_INDEX_FILE is specified, I think I did consider that but I
> dropped it because we'd lose the performance gain by splitting.
> 
> > Anyway, some food for thought.
> >
> > --
> > Brandon Williams
> 
> 
> 
> -- 
> Duy

Thanks for giving me some more context :) makes me feel more confident
in the solution that's already been proposed.

-- 
Brandon Williams



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux