On 6/27/2019 9:24 AM, Jeff Hostetler wrote: > On 6/27/2019 6:48 AM, Duy Nguyen wrote: >> On Tue, Jun 25, 2019 at 7:40 PM Derrick Stolee <stolee@xxxxxxxxx> wrote: >>> >>> On 6/25/2019 6:29 AM, Duy Nguyen wrote: >>>> On Tue, Jun 25, 2019 at 3:06 AM Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> wrote: >>>>> I'm curious how big these EWAHs will be in practice and >>>>> how useful an array of integers will be (especially as the >>>>> pretty format will be one integer per line). Perhaps it >>>>> would helpful to have an extended example in one of the >>>>> tests. >>>> >>>> It's one integer per updated entry. So if you have a giant index and >>>> updated every single one of them, the EWAH bitmap contains that many >>>> integers. >>>> >>>> If it was easy to just merge these bitmaps back to the entry (e.g. in >>>> this example, add "replaced": true to entry zero) I would have done >>>> it. But we dump as we stream and it's already too late to do it. >>>> >>>>> Would it be better to have the caller of ewah_each_bit() >>>>> build a hex or bit string in a strbuf and then write it >>>>> as a single string? >>>> >>>> I don't think the current EWAH representation is easy to read in the >>>> first place. You'll probably have to run through some script to update >>>> the main entries part and will have a much better view, but that's >>>> pretty quick. If it's for scripts, then it's probably best to keep as >>>> an array of integers, not a string. Less post processing. >>> >>> I don't think the intent is to dump the EWAH directly, but instead to >>> dump a string of the uncompressed bitmap. Something like: >>> >>> "delete_bitmap" : "01101101101" >>> >>> instead of >>> >>> "delete_bitmap" : [ 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1 ] >> >> I get this part. But the numbers in the array were the position of the >> set bits. It's not showing just the actual bit map. >> >> The same bitmap would be currently displayed as >> >> "delete_bitmap": [ 1, 2, 4, 5, 7, 8, 9, 11 ] >> >> And that maps back to the entry[1], entry[2], entry[4]... in the index >> being deleted from the base index. So displaying as a real bit map >> actually adds more work for both the reader and the tool because you >> have to calculate the position either way. And it gets harder if the >> bit you're intereted in is on the far right. > > > Thanks for the clarification. That helps. Same here! We expect these to be much smaller than the full set, correct? Thanks, -Stolee