Re: [PATCH v2 23/24] pack-bitmap-write: relax unique rewalk condition

Taylor Blau <me@xxxxxxxxxxxx> · Wed, 2 Dec 2020 11:30:23 -0500

On Tue, Dec 01, 2020 at 11:44:39PM -0800, Jonathan Tan wrote:
> Do you have numbers of how large the commit bitmasks are?

No, I didn't measure the size of the commit bitmasks directly, but they
are captured in the peak heap measurements that I took below.

> > In an effort to discover a happy medium, this change reduces the walk
> > for intermediate commits to only the first-parent history. This focuses
> > the walk on how the histories converge, which still has significant
> > reduction in repeat object walks. It is still possible to create
> > quadratic behavior in this version, but it is probably less likely in
> > realistic data shapes.
>
> Would this work? I agree that the width of the commit bitmasks would go
> down (and there would also be fewer commit bitmasks generated, further
> increasing the memory savings). But intuitively, if there is a commit
> that is selected and only accessible through non-1st-parent links, then
> any bitmaps generated for it cannot be contributed to its descendants
> (since there was no descendant-to-ancestor walk that could reach it in
> order to form the reverse edge).

s/bitmaps/bitmasks. We'll select commits independent of their first
parent histories, and so in the situation that you're describing, if C
reaches A only through non-1st-parent history, then A's bitmask will not
contain the bits from C.

But when generating the reachability bitmap for C, we'll still find that
we've generated a bitmap for A, and we can copy its bits directly. If
this differs from an ancestor P that _is_ in the first-parent history,
then P pushed its bits to C before calling fill_bitmap_commit() through
the reverse edges.

> > Here is some data taken on a fresh clone of the kernel:
> >
> >              |   runtime (sec)    |   peak heap (GB)   |
> >              |                    |                    |
> >              |   from  |   with   |   from  |   with   |
> >              | scratch | existing | scratch | existing |
> >   -----------+---------+----------+---------+-----------
> >     original |  64.044 |   83.241 |   2.088 |    2.194 |
> >   last patch |  44.811 |   27.828 |   2.289 |    2.358 |
> >   this patch | 100.641 |   35.560 |   2.152 |    2.224 |
>
> Hmm...the jump from 44 to 100 seems rather large.

Indeed. It's ameliorated a little bit in the later patches. We are
over-walking some objects (as in we are walking them multiple times),
but the return we get is reducing the peak heap usage from what it was
in the last patch.

In the "unfathomably large" category, this makes things tractable.

Thanks,
Taylor