> > > In an effort to discover a happy medium, this change reduces the walk > > > for intermediate commits to only the first-parent history. This focuses > > > the walk on how the histories converge, which still has significant > > > reduction in repeat object walks. It is still possible to create > > > quadratic behavior in this version, but it is probably less likely in > > > realistic data shapes. > > > > Would this work? I agree that the width of the commit bitmasks would go > > down (and there would also be fewer commit bitmasks generated, further > > increasing the memory savings). But intuitively, if there is a commit > > that is selected and only accessible through non-1st-parent links, then > > any bitmaps generated for it cannot be contributed to its descendants > > (since there was no descendant-to-ancestor walk that could reach it in > > order to form the reverse edge). > > s/bitmaps/bitmasks. I do mean bitmaps there - bitmasks are contributed to parents, but bitmaps are contributed to descendants, if I remember correctly. > We'll select commits independent of their first > parent histories, and so in the situation that you're describing, if C > reaches A only through non-1st-parent history, then A's bitmask will not > contain the bits from C. C is the descendant and A is the ancestor. Yes, A's bitmask will not contain the bits from C. > But when generating the reachability bitmap for C, we'll still find that > we've generated a bitmap for A, and we can copy its bits directly. Here is my contention - this can happen only if there is a reverse edge from A to C, as far as I can tell, but such a reverse edge has not been formed. > If > this differs from an ancestor P that _is_ in the first-parent history, > then P pushed its bits to C before calling fill_bitmap_commit() through > the reverse edges. > > > > Here is some data taken on a fresh clone of the kernel: > > > > > > | runtime (sec) | peak heap (GB) | > > > | | | > > > | from | with | from | with | > > > | scratch | existing | scratch | existing | > > > -----------+---------+----------+---------+----------- > > > original | 64.044 | 83.241 | 2.088 | 2.194 | > > > last patch | 44.811 | 27.828 | 2.289 | 2.358 | > > > this patch | 100.641 | 35.560 | 2.152 | 2.224 | > > > > Hmm...the jump from 44 to 100 seems rather large. > > Indeed. It's ameliorated a little bit in the later patches. We are > over-walking some objects (as in we are walking them multiple times), > but the return we get is reducing the peak heap usage from what it was > in the last patch. > > In the "unfathomably large" category, this makes things tractable. Quoting from the next patch [1]: > | runtime (sec) | peak heap (GB) | > | | | > | from | with | from | with | > | scratch | existing | scratch | existing | > -----------+---------+----------+---------+----------- > last patch | 100.641 | 35.560 | 2.152 | 2.224 | > this patch | 99.720 | 11.696 | 2.152 | 2.217 | That is true, but it is not ameliorated much :-( If you have steps to generate these timings, I would like to try comparing the performance between all patches and all-except-23. [1] https://lore.kernel.org/git/42399a1c2e52e1d055a2d0ad96af2ca4dce6b1a0.1605649533.git.me@xxxxxxxxxxxx/