Re: What is the status of GSoC 2022 work on making Git use roaring bitmaps?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello

On Tue, 1 Aug 2023 at 19:34, Taylor Blau <me@xxxxxxxxxxxx> wrote:
> On Tue, Aug 01, 2023 at 01:34:32PM +0200, Jakub Narębski wrote:
> > On Tue, 1 Aug 2023 at 13:26, Han-Wen Nienhuys <hanwen@xxxxxxxxxx> wrote:
> > > On Mon, Jul 31, 2023 at 10:18 PM Taylor Blau <me@xxxxxxxxxxxx> wrote:
> > > >
> > > > I haven't proved conclusively one way or the other where Roaring+Run is
> > > > significantly faster than EWAH or vice-versa. There are some cases where
> > > > the former is a clear winner, and other cases where it's the latter.
> > > >
> > > > In any event, my extremely WIP patches to make this mostly work are
> > > > available here:
> > > >
> > > >   https://github.com/ttaylorr/git/compare/tb/roaring-bitmaps
> > > >
> > >
> > > thanks. For anyone reading along, the changes to JGit are here
> > >
> > > https://git.eclipse.org/r/c/jgit/jgit/+/203448
> > >
> > > I was looking into this because I was hoping that roaring might
> > > decrease peak memory usage.
> > >
> > > I don't have firm evidence that it's better or worse, but I did
> > > observe that runtime and memory usage during GC (which is heavy on
> > > bitmap operations due to delta/xor encoding) was unchanged. That makes
> > > me pessimistic that there are significant gains to be had.
> >
> > The major advantage Roaring bitmaps have over EWAH and other
> > simple Run Length Encoding based compression algorithms is that
> > bitmap operations can be done on compressed bitmaps: there is no
> > need to uncompress bitmap to do (want1 OR want2 AND NOT have).
>
> Yeah, this is definitely where the majority of CPU savings seems to
> remain. The existing implementation in my branch is much too eager to
> uncompress bitmaps when we need to perform a logical/binary operation on
> them.

As I understand it, the current code in Git (in C implementation) uses
uncompressed bitmap to store the result of OR-ing, but it uses compressed
EWAH to perform <uncompressed result> OR <EWAH-compressed bitmap>.

> I think with some more surgery we could leave bitmaps in a compressed
> state for longer. I am not sure whether or not we should ever uncompress
> the bitmaps, though it's possible that doing so is beneficial since
> uncompressed bitmaps have better query performance (albeit more costly
> memory usage).

I'm not sure if it would be worth the complexity, but supposedly you can
perform L bitwise OR operations in O(log(L)) instead of O(L). Current C code
computes OR operation sequentially, see above, and also
  https://github.com/git/git/blob/master/pack-bitmap.c#L103\0

There might be thousands of "wants" and of "haves", but is it common?

Best,
-- 
Jakub Narębski




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux