Re: [PATCH v3 0/5] Optimization batch 13: partial clone optimizations for merge-ort

Derrick Stolee <stolee@xxxxxxxxx> · Tue, 22 Jun 2021 22:14:03 -0400

On 6/22/2021 2:45 PM, Elijah Newren wrote:
> On Tue, Jun 22, 2021 at 9:10 AM Derrick Stolee <stolee@xxxxxxxxx> wrote:

I want to focus on this item:

>> 2. I watched for the partial clone logic to kick in and download blobs.
>>    Some of these were inevitable: we need the blobs to resolve edit/edit
>>    conflicts. Most cases none were downloaded at all, so this series is
>>    working as advertised. There _was_ a case where the inexact rename
>>    detection requested a large list of files (~2900 in three batches) but
>>    _then_ said "inexact rename detection was skipped due to too many
>>    files". This is a case that would be nice to resolve in this series. I
>>    will try to find exactly where in the code this is being triggered and
>>    report back.
> 
> This suggests perhaps that EITHER there was a real modify/delete
> conflict (because you have to do full rename detection to rule out
> that the modify/delete was part of some rename), OR that there was a
> renamed file modified on both sides that did not keep its original
> basename (because that combination is needed to bypass the various
> optimizations and make it fall back to full inexact rename detection).
> Further, in either case, there were enough adds/deletes that full
> inexact detection is still a bit expensive.  It'd be interesting to
> know which case it was.  What happens if you set merge.renameLimit to
> something higher (the default is surprisingly small)?

The behavior I'd like to see is that the partial clone logic is not
run if we are going to download more than merge.renameLimit files.
Whatever is getting these missing blobs is earlier than the limit
check, but it should be after instead.

It's particularly problematic that Git does all the work to get the
blobs, but then gives up and doesn't even use them for rename
detection.

I'm happy that we download necessary blobs when there are a few
dozen files that need inexact renames. When it gets into the
thousands, then we jump into a different category of user experience.

Having a stop-gap of rename detection limits is an important way to
avoid huge amounts of file downloads in these huge repo cases. Users
can always opt into a larger limit if they really do want that rename
detection to work at such a large scale, but we still need protections
for the vast majority of cases where a user isn't willing to pay the
cost of downloading these blobs.

Thanks,
-Stolee