Hello On Tue, 1 Aug 2023 at 19:34, Taylor Blau <me@xxxxxxxxxxxx> wrote: > On Tue, Aug 01, 2023 at 01:34:32PM +0200, Jakub Narębski wrote: > > On Tue, 1 Aug 2023 at 13:26, Han-Wen Nienhuys <hanwen@xxxxxxxxxx> wrote: > > > On Mon, Jul 31, 2023 at 10:18 PM Taylor Blau <me@xxxxxxxxxxxx> wrote: > > > > > > > > I haven't proved conclusively one way or the other where Roaring+Run is > > > > significantly faster than EWAH or vice-versa. There are some cases where > > > > the former is a clear winner, and other cases where it's the latter. > > > > > > > > In any event, my extremely WIP patches to make this mostly work are > > > > available here: > > > > > > > > https://github.com/ttaylorr/git/compare/tb/roaring-bitmaps > > > > > > > > > > thanks. For anyone reading along, the changes to JGit are here > > > > > > https://git.eclipse.org/r/c/jgit/jgit/+/203448 > > > > > > I was looking into this because I was hoping that roaring might > > > decrease peak memory usage. > > > > > > I don't have firm evidence that it's better or worse, but I did > > > observe that runtime and memory usage during GC (which is heavy on > > > bitmap operations due to delta/xor encoding) was unchanged. That makes > > > me pessimistic that there are significant gains to be had. > > > > The major advantage Roaring bitmaps have over EWAH and other > > simple Run Length Encoding based compression algorithms is that > > bitmap operations can be done on compressed bitmaps: there is no > > need to uncompress bitmap to do (want1 OR want2 AND NOT have). > > Yeah, this is definitely where the majority of CPU savings seems to > remain. The existing implementation in my branch is much too eager to > uncompress bitmaps when we need to perform a logical/binary operation on > them. As I understand it, the current code in Git (in C implementation) uses uncompressed bitmap to store the result of OR-ing, but it uses compressed EWAH to perform <uncompressed result> OR <EWAH-compressed bitmap>. > I think with some more surgery we could leave bitmaps in a compressed > state for longer. I am not sure whether or not we should ever uncompress > the bitmaps, though it's possible that doing so is beneficial since > uncompressed bitmaps have better query performance (albeit more costly > memory usage). I'm not sure if it would be worth the complexity, but supposedly you can perform L bitwise OR operations in O(log(L)) instead of O(L). Current C code computes OR operation sequentially, see above, and also https://github.com/git/git/blob/master/pack-bitmap.c#L103\0 There might be thousands of "wants" and of "haves", but is it common? Best, -- Jakub Narębski