Re: [PATCH v1 16/19] read-cache: unlink old sharedindex files

Christian Couder <christian.couder@xxxxxxxxx> · Thu, 27 Oct 2016 14:14:28 +0200

On Thu, Oct 27, 2016 at 12:25 PM, Duy Nguyen <pclouds@xxxxxxxxx> wrote:
> On Tue, Oct 25, 2016 at 5:43 PM, Duy Nguyen <pclouds@xxxxxxxxx> wrote:
>>>  static int write_shared_index(struct index_state *istate,
>>> @@ -2211,8 +2269,11 @@ static int write_shared_index(struct index_state *istate,
>>>         }
>>>         ret = rename_tempfile(&temporary_sharedindex,
>>>                               git_path("sharedindex.%s", sha1_to_hex(si->base->sha1)));
>>> -       if (!ret)
>>> +       if (!ret) {
>>>                 hashcpy(si->base_sha1, si->base->sha1);
>>> +               clean_shared_index_files(sha1_to_hex(si->base->sha1));
>>
>> This operation is technically garbage collection and should belong to
>> "git gc --auto", which is already called automatically in a few
>> places. Is it not called often enough that we need to do the cleaning
>> up right after a new shared index is created?
>
> Christian, if we assume to go with Junio's suggestion to disable
> split-index on temporary files, the only files left we have to take
> care of are index and index.lock. I believe pruning here in this code
> will have an advantage over in "git gc --auto" because when this is
> executed, we know we're holding index.lock, so nobody else is updating
> the index, it's race-free. All we need to do is peek in $GIT_DIR/index
> to see what shared index file it requires and keep it alive too, the
> remaining of shared index files can be deleted safely. We don't even
> need to fall back to mtime.
>
> git-gc just can't match this because while it's running, somebody else
> may be updating $GIT_DIR/index. Handling races would be a lot harder.

Yeah, I agree that if we disable split-index on temporary files and on
commands like "git read-tree --index-output=<file>" then it is the
right thing to do it in write_shared_index(). (In fact when I wrote
the previous RFC series I didn't think about those special cases, and
that was the reason why I did it like this. So I just need to go back
to the implementation that was in the previous RFC series.)

I am just still wondering if disabling split-index on temporary files
could not have a bad performance impact for some use cases, but I
guess we could always come back to problem again if that happens.

Thanks,
Christian.