Re: [PATCH 4/8] add functions for memory-efficient bitmaps

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 01 Jul 2014 09:57:13 -0700

Jeff King <peff@xxxxxxxx> writes:

> On Sun, Jun 29, 2014 at 03:41:37AM -0400, Eric Sunshine wrote:
>
>> > +static inline void bitset_set(unsigned char *bits, int n)
>> > +{
>> > +       bits[n / CHAR_BIT] |= 1 << (n % CHAR_BIT);
>> > +}
>> 
>> Is it intentional or an oversight that there is no way to clear a bit
>> in the set?
>
> Intentional in the sense that I had no need for it in my series, and I
> didn't think about it. I doubt many callers would want it, since commit
> traversals tend to propagate bits through the graph, and then clean them
> up all at once. And the right way to clean up slabbed data like this is
> to just clear the slab.
>
> Of course somebody may use the code for something besides commit
> traversals. But I'd rather avoid adding dead code on the off chance that
> somebody uses it later (and then gets to find out whether it even works
> or not!).

Another thing I noticed was that the definition of and the
commentary on bitset_equal() and bitset_empty() sounded somewhat
"undecided".  These functions take "max" that is deliberately named
differently from "num_bits" (the width of the bitsets involved),
inviting to use them for testing only earlier bits in the bitset as
long as the caller understands the caveat, but the caveat requires
that the partial bitset to test must be byte-aligned, which makes it
not very useful in practice, which means we probably do not want
them to be used for any "max" other than "num_bits".

They probably would want either:

 * be made to truly honor max < num_bits case, by special casing the
   last byte that has max-th bit, to officially allow them to be
   used for partial bitset test; or

 * take "num_bits", not "max", to clarify that callers must use them
   only on the full bitset.

In either case, there needs another item in the "caller's responsibility"
list at the beginning of bitset.h:

    4. Ensure that padding bits at the end of the bitset array are
       initialized to 0.

In the description of bitset_sizeof(), the comment hints it by using
xcalloc() in the example, but a careless user may be tempted to
implement bitset_clr() and then do:

        int i;
        unsigned char *bits = malloc(bitset_sizeof(nr));
        for (i = 0; i < nr; i++)
        	bitset_clr(bits, i);
	assert(bitset_empty(bits, nr));

and the implementation of bitset_empty(), even if we rename
s/max/num_bits/, will choke if (nr % CHAR_BIT) and malloc() gave us
non-zero bit in the padding.

For the sake of simplicity, I am inclined to vote for not allowing
their use on a partial-sub-bitset.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html