On Thu, Oct 11, 2018 at 08:33:58AM -0400, Derrick Stolee wrote: > > I don't know if this is a fruitful path at all or not. I was mostly just > > satisfying my own curiosity on the bitmap encoding question. But I'll > > post the patches, just to show my work. The first one is the same > > initial proof of concept I showed earlier. > > > > [1/3]: initial tree-bitmap proof of concept > > [2/3]: test-tree-bitmap: add "dump" mode > > [3/3]: test-tree-bitmap: replace ewah with custom rle encoding > > > > Makefile | 1 + > > t/helper/test-tree-bitmap.c | 344 ++++++++++++++++++++++++++++++++++++ > > 2 files changed, 345 insertions(+) > > create mode 100644 t/helper/test-tree-bitmap.c > I'm trying to test this out myself, and am having trouble reverse > engineering how I'm supposed to test it. > > Looks like running "t/helper/test-tree-bitmap gen" will output a lot of > binary data. Where should I store that? Does any file work? Yeah, you can do: # optionally run with GIT_TRACE=1 to see some per-bitmap stats test-tree-bitmap gen >out # this should be roughly the same as: # git rev-list --all | # git diff-tree --stdin -t --name-only test-tree-bitmap dump <out > Is this series just for the storage costs, assuming that we would replace > all TREESAME checks with a query into this database? Or do you have a way to > test how much this would improve a "git log -- <path>" query? Right, I was just looking at storage cost here. It's not integrated with the diff code at all. I left some hypothetical numbers elsewhere in the thread. -Peff