Hello, On Tue, 1 Aug 2023 at 13:54, Han-Wen Nienhuys <hanwen@xxxxxxxxxx> wrote: > On Tue, Aug 1, 2023 at 1:35 PM Jakub Narębski <jnareb@xxxxxxxxx> wrote: > > On Tue, 1 Aug 2023 at 13:26, Han-Wen Nienhuys <hanwen@xxxxxxxxxx> wrote: > > > On Mon, Jul 31, 2023 at 10:18 PM Taylor Blau <me@xxxxxxxxxxxx> wrote: > > > > > > > > I haven't proved conclusively one way or the other where Roaring+Run is > > > > significantly faster than EWAH or vice-versa. There are some cases where > > > > the former is a clear winner, and other cases where it's the latter. > > > > > > > > In any event, my extremely WIP patches to make this mostly work are > > > > available here: > > > > > > > > https://github.com/ttaylorr/git/compare/tb/roaring-bitmaps > > > > > > > > > > thanks. For anyone reading along, the changes to JGit are here > > > > > > https://git.eclipse.org/r/c/jgit/jgit/+/203448 > > > > > > I was looking into this because I was hoping that roaring might > > > decrease peak memory usage. > > > > > > I don't have firm evidence that it's better or worse, but I did > > > observe that runtime and memory usage during GC (which is heavy on > > > bitmap operations due to delta/xor encoding) was unchanged. That makes > > > me pessimistic that there are significant gains to be had. > > > > The major advantage Roaring bitmaps have over EWAH and other > > simple Run Length Encoding based compression algorithms is that > > bitmap operations can be done on compressed bitmaps: there is no > > need to uncompress bitmap to do (want1 OR want2 AND NOT have). > > Are you sure? The source code for and and andNot look rather similar > in that they seem to do operations on whole RLE sections at a time, > > https://sourcegraph.com/github.com/lemire/javaewah/-/blob/src/main/java/com/googlecode/javaewah/EWAHCompressedBitmap.java?L498 > > https://sourcegraph.com/github.com/lemire/javaewah/-/blob/src/main/java/com/googlecode/javaewah/EWAHCompressedBitmap.java?L405 > > Looking at the EWAH format as documented for git-bitmap-format, EWAH > allows for RLE on both 1s and 0s. It should be possible to efficiently > clear out a section of the target if the second operand of andNot has > RLE encoded run of 1s. You are right, I seem to have misremembered the statement from the EWAH paper. Lemma 2 in "Sorting improves word-aligned bitmap indexes" (arXiv:0901.3751v7) states that the bitmap operation of L bitmaps is computable in O(L*compressed size), and for updatable L-ary operation like symmetric boolean operation the bitmap operation is computable in O(log(L)*compressed size). Now I am not sure if I understand the code of Git correctly, but it seems like in pack-bitmap.c the `add_commit_to_bitmap()` function stores the result of OR operation on bitmaps as an uncompressed bitmap: https://github.com/git/git/blob/master/pack-bitmap.c#L1030 Best, -- Jakub Narębski