Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos

Duy Nguyen <pclouds@xxxxxxxxx> · Thu, 18 Jan 2018 17:19:22 +0700



On Thu, Jan 18, 2018 at 1:16 AM, Jonathan Nieder <jrnieder@xxxxxxxxx> wrote:
> Hi,
>
> Duy Nguyen wrote:
>> On Wed, Jan 17, 2018 at 4:42 AM, Brandon Williams <bmwill@xxxxxxxxxx> wrote:
>
>>>                                  IIUC Split index is an index extension
>>> that can be enabled to limit the size of the index file that is written
>>> when making changes to the index.  It breaks the index into two pieces,
>>> index (which contains only changes) and sharedindex.XXXXX (which
>>> contains unchanged information) where 'XXXXX' is a value found in the
>>> index file.  If we don't do anything fancy then these two files live
>>> next to one another in a repository's git directory at $GIT_DIR/index
>>> and $GIT_DIR/sharedindex.XXXXX.  This seems to work all well and fine
>>> except that this isn't always the case and the read_index_from function
>>> takes this into account by enabling a caller to specify a path to where
>>> the index file is located.  We can do this by specifying the index file
>>> we want to use by setting GIT_INDEX_FILE.
> [...]
>>> In this case if i were to specify a location of an
>>> index file in my home directory '~/index' and be using the split index
>>> feature then the corresponding sharedindex file would live in my
>>> repository's git directory '~/project/.git/sharedindex.XXXXX'.  So the
>>> sharedindex file is always located relative to the project's git
>>> directory and not the index file itself, which is kind of confusing.
>>> Maybe a better design would be to have the sharedindex file located
>>> relative to the index file.
>>
>> That adds more problems. Now when you move the index file around you
>> have to move the shared index file too (think about atomic rename
>> which we use in plenty of places, we can't achieve that by moving two
>> files). A new dependency to $GIT_DIR is not that confusing to me, the
>> index file is useless anyway if you don't have access to
>> $GIT_DIR/objects. There was always the option to _not_ split the index
>> when $GIT_INDEX_FILE is specified, I think I did consider that but I
>> dropped it because we'd lose the performance gain by splitting.
>
> Can you elaborate a little more on this?
>
> At first glance, it seems simpler to say "paths in index extensions
> named in the index file are relative to the location of the index
> file" and to make moving the index file also require moving the shared
> index file, exactly as you say.  So at least from a "principle of
> least surprise" perspective I would be tempted to go that way.
>
> It's true that we rely on atomic rename in plenty of places, but only
> within a directory.  (Filesystem boundaries, NFS, etc mean that atomic
> renames across directories are a lost cause.)
>
> Fortunately index files (including temp index files used by scripts)
> tend to only be in $GIT_DIR, for exactly that reason.  So I am
> wondering if switching to index-file-relative semantics would be an
> invasive move and what the pros and cons of such a move are.

I think it gets messier. Now you have to move two files. If the first
move succeeds but the second one fails, recovery may involve un-move
the first file, but its old content is already gone. We probably can
get around that. But since the shared index is assumed big and heavy,
I just went with "store it in the place it's going to be and never
move it anywhere ever (until nobody uses it then it's deleted)"
-- 
Duy