On Fri, Jun 15, 2018 at 02:23:44PM -0400, Derrick Stolee wrote: > If we are considering changing the reachability bitmap, then I'm very > intrigued. I think the number one thing to consider is to use the > multi-pack-index as a reference point (with a stable object order) so the > objects can be repacked independently from the reachability bitmap > computation. If we are changing the model at that level, then it is worth > thinking about other questions, like how we index the file or how we > compress the bitmaps. I'm open to a new format if it provides significant improvements over the existing one. I think the existing bitmaps have served us well for several years, but they do have a few weaknesses. Some of which I mentioned before, but the most obvious one is that being very pack-oriented they require repacking to update (and don't handle cross-pack reachability at all). I know that doesn't fly for Windows-sized repos at all, but it would also be nice if we could do incremental updates more cheaply (e.g., after every push instead of just once a day). The Roaring stuff looks really interesting. I'm curious about the stable object order you guys use. Because EWAH is basically run-length-encoding, it benefits hugely from having the bitmaps in pack order (where there's a enormous locality with respect to reachability) as opposed to sha1 order (where it's essentially random). Is your stable object order based on traversing the commit graph? Or does Roaring do a sufficiently better job of compressing the jumbled sha1 order? -Peff