On 01/17, Duy Nguyen wrote: > On Wed, Jan 17, 2018 at 4:42 AM, Brandon Williams <bmwill@xxxxxxxxxx> wrote: > > On 01/07, Thomas Gummerer wrote: > >> read_index_from() takes a path argument for the location of the index > >> file. For reading the shared index in split index mode however it just > >> ignores that path argument, and reads it from the gitdir of the current > >> repository. > >> > >> This works as long as an index in the_repository is read. Once that > >> changes, such as when we read the index of a submodule, or of a > >> different working tree than the current one, the gitdir of > >> the_repository will no longer contain the appropriate shared index, > >> and git will fail to read it. > >> > >> For example t3007-ls-files-recurse-submodules.sh was broken with > >> GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository > >> object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also > >> broken in a similar manner, probably by introducing struct repository > >> there, although I didn't track down the exact commit for that. > >> > >> be489d02d2 ("revision.c: --indexed-objects add objects from all > >> worktrees", 2017-08-23) breaks with split index mode in a similar > >> manner, not erroring out when it can't read the index, but instead > >> carrying on with pruning, without taking the index of the worktree into > >> account. > >> > >> Fix this by passing an additional gitdir parameter to read_index_from, > >> to indicate where it should look for and read the shared index from. > >> > >> read_cache_from() defaults to using the gitdir of the_repository. As it > >> is mostly a convenience macro, having to pass get_git_dir() for every > >> call seems overkill, and if necessary users can have more control by > >> using read_index_from(). > > > > I'm not saying we need to change this again but I got to thinking about > > what the root cause for this bug is and I think it's a design flaw in > > how split index is implemented. IIUC Split index is an index extension > > that can be enabled to limit the size of the index file that is written > > when making changes to the index. It breaks the index into two pieces, > > index (which contains only changes) and sharedindex.XXXXX (which > > contains unchanged information) where 'XXXXX' is a value found in the > > index file. If we don't do anything fancy then these two files live > > next to one another in a repository's git directory at $GIT_DIR/index > > and $GIT_DIR/sharedindex.XXXXX. This seems to work all well and fine > > except that this isn't always the case and the read_index_from function > > takes this into account by enabling a caller to specify a path to where > > the index file is located. We can do this by specifying the index file > > we want to use by setting GIT_INDEX_FILE. > > > > Being able to specify an index file via the environment is a feature > > that has existed for a very long time (one that I personally think makes > > things difficult because of things like additions like the sharedindex) > > and I don't think was taken into account when introducing the split > > index extension. > > It was (partly because I did use GIT_INDEX_FILE on occasions). > > > In this case if i were to specify a location of an > > index file in my home directory '~/index' and be using the split index > > feature then the corresponding sharedindex file would live in my > > repository's git directory '~/project/.git/sharedindex.XXXXX'. So the > > sharedindex file is always located relative to the project's git > > directory and not the index file itself, which is kind of confusing. > > Maybe a better design would be to have the sharedindex file located > > relative to the index file. > > That adds more problems. Now when you move the index file around you > have to move the shared index file too (think about atomic rename > which we use in plenty of places, we can't achieve that by moving two > files). A new dependency to $GIT_DIR is not that confusing to me, the > index file is useless anyway if you don't have access to > $GIT_DIR/objects. There was always the option to _not_ split the index > when $GIT_INDEX_FILE is specified, I think I did consider that but I > dropped it because we'd lose the performance gain by splitting. > > > Anyway, some food for thought. > > > > -- > > Brandon Williams > > > > -- > Duy Thanks for giving me some more context :) makes me feel more confident in the solution that's already been proposed. -- Brandon Williams