Re: What is the status of GSoC 2022 work on making Git use roaring bitmaps?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

On Tue, 1 Aug 2023 at 13:54, Han-Wen Nienhuys <hanwen@xxxxxxxxxx> wrote:
> On Tue, Aug 1, 2023 at 1:35 PM Jakub Narębski <jnareb@xxxxxxxxx> wrote:
> > On Tue, 1 Aug 2023 at 13:26, Han-Wen Nienhuys <hanwen@xxxxxxxxxx> wrote:
> > > On Mon, Jul 31, 2023 at 10:18 PM Taylor Blau <me@xxxxxxxxxxxx> wrote:
> > > >
> > > > I haven't proved conclusively one way or the other where Roaring+Run is
> > > > significantly faster than EWAH or vice-versa. There are some cases where
> > > > the former is a clear winner, and other cases where it's the latter.
> > > >
> > > > In any event, my extremely WIP patches to make this mostly work are
> > > > available here:
> > > >
> > > >   https://github.com/ttaylorr/git/compare/tb/roaring-bitmaps
> > > >
> > >
> > > thanks. For anyone reading along, the changes to JGit are here
> > >
> > > https://git.eclipse.org/r/c/jgit/jgit/+/203448
> > >
> > > I was looking into this because I was hoping that roaring might
> > > decrease peak memory usage.
> > >
> > > I don't have firm evidence that it's better or worse, but I did
> > > observe that runtime and memory usage during GC (which is heavy on
> > > bitmap operations due to delta/xor encoding) was unchanged. That makes
> > > me pessimistic that there are significant gains to be had.
> >
> > The major advantage Roaring bitmaps have over EWAH and other
> > simple Run Length Encoding based compression algorithms is that
> > bitmap operations can be done on compressed bitmaps: there is no
> > need to uncompress bitmap to do (want1 OR want2 AND NOT have).
>
> Are you sure? The source code for and and andNot look rather similar
> in that they seem to do operations on whole RLE sections at a time,
>
> https://sourcegraph.com/github.com/lemire/javaewah/-/blob/src/main/java/com/googlecode/javaewah/EWAHCompressedBitmap.java?L498
>
> https://sourcegraph.com/github.com/lemire/javaewah/-/blob/src/main/java/com/googlecode/javaewah/EWAHCompressedBitmap.java?L405
>
> Looking at the EWAH format as documented for git-bitmap-format, EWAH
> allows for RLE on both 1s and 0s. It should be possible to efficiently
> clear out a section of the target if the second operand of andNot has
> RLE encoded run of 1s.

You are right, I seem to have misremembered the statement from the
EWAH paper.

Lemma 2 in "Sorting improves word-aligned bitmap indexes" (arXiv:0901.3751v7)
states that the bitmap operation of L bitmaps is computable in
O(L*compressed size),
and for updatable L-ary operation like symmetric boolean operation
the bitmap operation is computable in O(log(L)*compressed size).

Now I am not sure if I understand the code of Git correctly, but it seems like
in pack-bitmap.c the `add_commit_to_bitmap()` function stores the result
of OR operation on bitmaps as an uncompressed bitmap:
https://github.com/git/git/blob/master/pack-bitmap.c#L1030

Best,
-- 
Jakub Narębski




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux