Re: [PATCH v1 16/19] read-cache: unlink old sharedindex files

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 27 Oct 2016 09:13:10 -0700

Duy Nguyen <pclouds@xxxxxxxxx> writes:

> Christian, if we assume to go with Junio's suggestion to disable
> split-index on temporary files, the only files left we have to take
> care of are index and index.lock. I believe pruning here in this code
> will have an advantage over in "git gc --auto" because when this is
> executed, we know we're holding index.lock, so nobody else is updating
> the index, it's race-free.
>
> All we need to do is peek in $GIT_DIR/index
> to see what shared index file it requires and keep it alive too, the
> remaining of shared index files can be deleted safely. We don't even
> need to fall back to mtime.

Yes, that exactly was why I wondered if we can afford to limit
splitting only to the primary index, because it makes things a
lot simpler.

But I suspect that temporary index is where split-index shines most,
e.g. while creating a partial commit.  The mechanism penalizes the
read performance by making the format more complex in order to favor
the write performance, which is very much suited for temporary one
that is read only once after it is written before it gets discarded
(on the other hand, splitting the primary index will penalize reads
that happen a lot more than writes).

While I still find it attractive at the conceptual level to limit
splitting only to the primary index for the resulting simplicity,
I doubt it is a good way to go, as I meant to say in
<xmqqeg33ccjj.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxx>

> git-gc just can't match this because while it's running, somebody else
> may be updating $GIT_DIR/index. Handling races would be a lot harder.

It could attempt to take a lock on the primary index while it runs,
and refrain to do anything if it can't take the lock ("gc --auto"
may want to silently retry), and then the race is no longer an
issue, no?