Re: [PATCH] bitmaps: don't recurse into trees already in the bitmap

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 18, 2021 at 03:35:05PM +0200, Patrick Steinhardt wrote:

> > More than once I've spent many head-scratching hours trying to determine
> > why some real-world repo performs better or worse than expected. :)
> 
> I couldn't agree more. I've also had my fair share of weird performance
> characteristics in completely unexpected ways. Unfortunately, it has
> made me become quite cautious about bitmaps given that they've already
> caused their fair share of perfomance regressions.
> 
> But your work here actually encourages me to give it another try soonish
> and see what kind of repo shapes make them explode this time.

Not really for immediate work, but I just wanted to note while I'm
thinking about it: the two big things that can actually make bitmaps
_worse_ than a regular traversal are:

  - loading the bitmaps is inherently linear in the file size. We could
    do better with an mmap-able index of bitmapped commits in the file.
    This would be easy to add as an extension to the file. This matters
    most for really small queries (the absolute time to parse the bitmap
    file isn't that high, but if the non-bitmap query can be answered in
    a few milliseconds, it becomes relatively more important).

  - we produce a definite answer of reachability for the bitmaps by
    filling in a bitmap for the negative (UNINTERESTING) tips, then one
    for the regular tips, and then taking a set difference. Whereas a
    normal traversal walks both sides together until it hits a merge
    base, and then actually looks into the trees in the boundary
    commits. This normal-traversal technique may look at fewer objects
    overall for some cases (especially if you have a lot of negative
    tips, like with "--not --all" and not-so-good bitmap coverage). But
    it does mean we may report an object as in the traversal when it's
    actually not (if it was added/removed in part of the uninteresting
    history, and so not present at the boundary).

    We could possibly teach the bitmap traversal to behave more like
    a normal one. I suspect this would clear up most of the regressions
    you'd see using bitmaps with rev-list.

There is one other related case I've run into. If you put a bitmap on
every commit, you'd _think_ that would make everything faster, since
you'd never traverse at all. But there's an inflection point where you
spend more time going through each bitmap twiddling bits (which remember
is O(nr_of_objects * nr_bitmapped_commits) bits across all of them). And
most of those bits will already be "1", so there's a point where the
saved traversal is less than the cost of OR-ing bits.

Anyway, this is all big-picture stuff that I don't expect you to work on
or even respond to. I'm mostly just writing up some thoughts that I
don't think have made it to the list in a coherent form. I'm not sure if
I'll eventually work on some of them or not (but anybody who is
interested can be my guest :) ).

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux