Re: [PATCH v2 23/24] pack-bitmap-write: relax unique rewalk condition

Derrick Stolee <stolee@xxxxxxxxx> · Mon, 7 Dec 2020 13:43:51 -0500

On 12/7/2020 1:19 PM, Jonathan Tan wrote:
>>>> In an effort to discover a happy medium, this change reduces the walk
>>>> for intermediate commits to only the first-parent history. This focuses
>>>> the walk on how the histories converge, which still has significant
>>>> reduction in repeat object walks. It is still possible to create
>>>> quadratic behavior in this version, but it is probably less likely in
>>>> realistic data shapes.
>>>
>>> Would this work? I agree that the width of the commit bitmasks would go
>>> down (and there would also be fewer commit bitmasks generated, further
>>> increasing the memory savings). But intuitively, if there is a commit
>>> that is selected and only accessible through non-1st-parent links, then
>>> any bitmaps generated for it cannot be contributed to its descendants
>>> (since there was no descendant-to-ancestor walk that could reach it in
>>> order to form the reverse edge).
>>
>> s/bitmaps/bitmasks. 
> 
> I do mean bitmaps there - bitmasks are contributed to parents, but
> bitmaps are contributed to descendants, if I remember correctly.

Ah, the confusion is related around the word "contributed".

Yes, without walking all the parents, we will not populate the
reverse edges with all of the possible connections. Thus, the
step that pushes reachability bitmap bits along the reverse edges
will not be as effective.

And this is the whole point: the reverse-edges existed to get us
into a state of _never_ walking an object multiple times, but that
ended up being too expensive to guarantee. This change relaxes that
condition in a way that still works for large, linear histories.

Since "pack-bitmap-write: fill bitmap with commit history" changed
fill_bitmap_commit() to walk commits until reaching those already in
the precomputed reachability bitmap, it will correctly walk far
enough to compute the reachability bitmap for that commit. It might
just walk objects that are part of _another_, already computed bitmap
that is not reachable via the first-parent history.

The very next patch "pack-bitmap-write: better reuse bitmaps" fixes
this problem by checking for computed bitmaps during the walk in
fill_bitmap_commit().

>> We'll select commits independent of their first
>> parent histories, and so in the situation that you're describing, if C
>> reaches A only through non-1st-parent history, then A's bitmask will not
>> contain the bits from C.
> 
> C is the descendant and A is the ancestor. Yes, A's bitmask will not
> contain the bits from C.
> 
>> But when generating the reachability bitmap for C, we'll still find that
>> we've generated a bitmap for A, and we can copy its bits directly. 
> 
> Here is my contention - this can happen only if there is a reverse edge
> from A to C, as far as I can tell, but such a reverse edge has not been
> formed.

See above. This patch is completely correct given the changes to
fill_bitmap_commit() from earlier. It just needs a tweak (in the
next patch) to recover some of the performance.

>> If
>> this differs from an ancestor P that _is_ in the first-parent history,
>> then P pushed its bits to C before calling fill_bitmap_commit() through
>> the reverse edges.
>>
>>>> Here is some data taken on a fresh clone of the kernel:
>>>>
>>>>              |   runtime (sec)    |   peak heap (GB)   |
>>>>              |                    |                    |
>>>>              |   from  |   with   |   from  |   with   |
>>>>              | scratch | existing | scratch | existing |
>>>>   -----------+---------+----------+---------+-----------
>>>>     original |  64.044 |   83.241 |   2.088 |    2.194 |
>>>>   last patch |  44.811 |   27.828 |   2.289 |    2.358 |
>>>>   this patch | 100.641 |   35.560 |   2.152 |    2.224 |
>>>
>>> Hmm...the jump from 44 to 100 seems rather large.
>>
>> Indeed. It's ameliorated a little bit in the later patches. We are
>> over-walking some objects (as in we are walking them multiple times),
>> but the return we get is reducing the peak heap usage from what it was
>> in the last patch.
>>
>> In the "unfathomably large" category, this makes things tractable.
> 
> Quoting from the next patch [1]:
> 
>>              |   runtime (sec)    |   peak heap (GB)   |
>>              |                    |                    |
>>              |   from  |   with   |   from  |   with   |
>>              | scratch | existing | scratch | existing |
>>   -----------+---------+----------+---------+-----------
>>   last patch | 100.641 |   35.560 |   2.152 |    2.224 |
>>   this patch |  99.720 |   11.696 |   2.152 |    2.217 |
> 
> That is true, but it is not ameliorated much :-(
> 
> If you have steps to generate these timings, I would like to try
> comparing the performance between all patches and all-except-23.
> 
> [1] https://lore.kernel.org/git/42399a1c2e52e1d055a2d0ad96af2ca4dce6b1a0.1605649533.git.me@xxxxxxxxxxxx/

The biggest problem is that all-except-23 is an unnacceptable
final state, since it has a performance blowout on super-wide
repos such as the git/git fork network. Perhaps Taylor could
include some performance numbers on that, but I'm pretty sure
that the calculation literally OOMs instead of completing. It
might be worth an explicit mention in the patch.

It might also be better to always include a baseline from the
start of the series to ensure that the final state is better
than the initial state. With only the last/this comparison,
it doesn't look great when we backtrack in performance (even
when it is necessary to do so).

Thanks,
-Stolee