Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 17, 2018 at 4:42 AM, Brandon Williams <bmwill@xxxxxxxxxx> wrote:
> On 01/07, Thomas Gummerer wrote:
>> read_index_from() takes a path argument for the location of the index
>> file.  For reading the shared index in split index mode however it just
>> ignores that path argument, and reads it from the gitdir of the current
>> repository.
>>
>> This works as long as an index in the_repository is read.  Once that
>> changes, such as when we read the index of a submodule, or of a
>> different working tree than the current one, the gitdir of
>> the_repository will no longer contain the appropriate shared index,
>> and git will fail to read it.
>>
>> For example t3007-ls-files-recurse-submodules.sh was broken with
>> GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository
>> object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also
>> broken in a similar manner, probably by introducing struct repository
>> there, although I didn't track down the exact commit for that.
>>
>> be489d02d2 ("revision.c: --indexed-objects add objects from all
>> worktrees", 2017-08-23) breaks with split index mode in a similar
>> manner, not erroring out when it can't read the index, but instead
>> carrying on with pruning, without taking the index of the worktree into
>> account.
>>
>> Fix this by passing an additional gitdir parameter to read_index_from,
>> to indicate where it should look for and read the shared index from.
>>
>> read_cache_from() defaults to using the gitdir of the_repository.  As it
>> is mostly a convenience macro, having to pass get_git_dir() for every
>> call seems overkill, and if necessary users can have more control by
>> using read_index_from().
>
> I'm not saying we need to change this again but I got to thinking about
> what the root cause for this bug is and I think it's a design flaw in
> how split index is implemented.  IIUC Split index is an index extension
> that can be enabled to limit the size of the index file that is written
> when making changes to the index.  It breaks the index into two pieces,
> index (which contains only changes) and sharedindex.XXXXX (which
> contains unchanged information) where 'XXXXX' is a value found in the
> index file.  If we don't do anything fancy then these two files live
> next to one another in a repository's git directory at $GIT_DIR/index
> and $GIT_DIR/sharedindex.XXXXX.  This seems to work all well and fine
> except that this isn't always the case and the read_index_from function
> takes this into account by enabling a caller to specify a path to where
> the index file is located.  We can do this by specifying the index file
> we want to use by setting GIT_INDEX_FILE.
>
> Being able to specify an index file via the environment is a feature
> that has existed for a very long time (one that I personally think makes
> things difficult because of things like additions like the sharedindex)
> and I don't think was taken into account when introducing the split
> index extension.

It was (partly because I did use GIT_INDEX_FILE on occasions).

> In this case if i were to specify a location of an
> index file in my home directory '~/index' and be using the split index
> feature then the corresponding sharedindex file would live in my
> repository's git directory '~/project/.git/sharedindex.XXXXX'.  So the
> sharedindex file is always located relative to the project's git
> directory and not the index file itself, which is kind of confusing.
> Maybe a better design would be to have the sharedindex file located
> relative to the index file.

That adds more problems. Now when you move the index file around you
have to move the shared index file too (think about atomic rename
which we use in plenty of places, we can't achieve that by moving two
files). A new dependency to $GIT_DIR is not that confusing to me, the
index file is useless anyway if you don't have access to
$GIT_DIR/objects. There was always the option to _not_ split the index
when $GIT_INDEX_FILE is specified, I think I did consider that but I
dropped it because we'd lose the performance gain by splitting.

> Anyway, some food for thought.
>
> --
> Brandon Williams



-- 
Duy



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux