On Tue, Jun 25, 2019 at 3:06 AM Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> wrote: > I'm curious how big these EWAHs will be in practice and > how useful an array of integers will be (especially as the > pretty format will be one integer per line). Perhaps it > would helpful to have an extended example in one of the > tests. It's one integer per updated entry. So if you have a giant index and updated every single one of them, the EWAH bitmap contains that many integers. If it was easy to just merge these bitmaps back to the entry (e.g. in this example, add "replaced": true to entry zero) I would have done it. But we dump as we stream and it's already too late to do it. > Would it be better to have the caller of ewah_each_bit() > build a hex or bit string in a strbuf and then write it > as a single string? I don't think the current EWAH representation is easy to read in the first place. You'll probably have to run through some script to update the main entries part and will have a much better view, but that's pretty quick. If it's for scripts, then it's probably best to keep as an array of integers, not a string. Less post processing. Another reason for not merging to one string (might not be a very good argument though) is to help diff between two indexes. One-number-per-line works well with "git diff --no-index" while one long string is a bit harder. I did this kind of comparison when I made changes in read-cache.c and wanted to check if the new index file is completely broken, or just slighly broken. -- Duy