Re: [PATCH v2 05/10] split-index.c: dump "link" extension as json

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 27, 2019 at 8:42 PM Derrick Stolee <stolee@xxxxxxxxx> wrote:
>
> On 6/27/2019 9:24 AM, Jeff Hostetler wrote:
> > On 6/27/2019 6:48 AM, Duy Nguyen wrote:
> >> On Tue, Jun 25, 2019 at 7:40 PM Derrick Stolee <stolee@xxxxxxxxx> wrote:
> >>>
> >>> On 6/25/2019 6:29 AM, Duy Nguyen wrote:
> >>>> On Tue, Jun 25, 2019 at 3:06 AM Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> wrote:
> >>>>> I'm curious how big these EWAHs will be in practice and
> >>>>> how useful an array of integers will be (especially as the
> >>>>> pretty format will be one integer per line).  Perhaps it
> >>>>> would helpful to have an extended example in one of the
> >>>>> tests.
> >>>>
> >>>> It's one integer per updated entry. So if you have a giant index and
> >>>> updated every single one of them, the EWAH bitmap contains that many
> >>>> integers.
> >>>>
> >>>> If it was easy to just merge these bitmaps back to the entry (e.g. in
> >>>> this example, add "replaced": true to entry zero) I would have done
> >>>> it. But we dump as we stream and it's already too late to do it.
> >>>>
> >>>>> Would it be better to have the caller of ewah_each_bit()
> >>>>> build a hex or bit string in a strbuf and then write it
> >>>>> as a single string?
> >>>>
> >>>> I don't think the current EWAH representation is easy to read in the
> >>>> first place. You'll probably have to run through some script to update
> >>>> the main entries part and will have a much better view, but that's
> >>>> pretty quick. If it's for scripts, then it's probably best to keep as
> >>>> an array of integers, not a string. Less post processing.
> >>>
> >>> I don't think the intent is to dump the EWAH directly, but instead to
> >>> dump a string of the uncompressed bitmap. Something like:
> >>>
> >>>          "delete_bitmap" : "01101101101"
> >>>
> >>> instead of
> >>>
> >>>          "delete_bitmap" : [ 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1 ]
> >>
> >> I get this part. But the numbers in the array were the position of the
> >> set bits. It's not showing just the actual bit map.
> >>
> >> The same bitmap would be currently displayed as
> >>
> >>   "delete_bitmap": [ 1, 2, 4, 5, 7, 8, 9, 11 ]
> >>
> >> And that maps back to the entry[1], entry[2], entry[4]... in the index
> >> being deleted from the base index. So displaying as a real bit map
> >> actually adds more work for both the reader and the tool because you
> >> have to calculate the position either way. And it gets harder if the
> >> bit you're intereted in is on the far right.
> >
> >
> > Thanks for the clarification.  That helps.
>
> Same here! We expect these to be much smaller than the full set, correct?

For split-index, the number of 1 bits should be about the size of your
working set, not the index size. In the normal case, then yes it
should be much smaller. After a big merge or branch switch, it could
get as big as the index. But I would hope the logic to re-split the
index kicks in, which essentially empties these bitmaps.

EWAH bitmap is also used in UNTR extension if I remember correctly.
Those bitmaps may have as many bits as the directories you have in the
index.

> Thanks,
> -Stolee
>


-- 
Duy



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux