Re: [PATCH 09/13] refs: introduce an iterator interface

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Tue, 31 May 2016 09:59:04 +0200

On 05/31/2016 07:29 AM, Eric Sunshine wrote:
> On Mon, May 30, 2016 at 3:55 AM, Michael Haggerty <mhagger@xxxxxxxxxxxx> wrote:
>> [...]
> [...]
> Either:
> 
>     s/false/something other than ITER_OK/
> 
> or:
> 
>     s/false/ITER_DONE or ITER_ERROR/

Thanks.

>> +int ref_iterator_advance(struct ref_iterator *ref_iterator);
>> +
>> +/*
>> + * An iterator over nothing (its first ref_iterator_advance() call
>> + * returns 0).
>> + */
> 
> s/0/ITER_DONE/

Thanks. I guess you can guess what an earlier draft of this interface
looked like :-)

>> +struct ref_iterator *empty_ref_iterator_begin(void);
>> +
>> +/*
>> + * Return true iff ref_iterator is an empty_ref_iterator.
>> + */
>> +int is_empty_ref_iterator(struct ref_iterator *ref_iterator);
> 
> I can see that you used this function as an optimization or
> convenience in overlay_ref_iterator_begin(), but do you expect it to
> be generally useful otherwise? Is it worth publishing? Do you have
> other use-cases in mind?

It is only "published" within the refs module, in refs/refs-internal.h.
This header file is not meant to be used by code outside of the refs module.

My thinking was that it might be useful to other reference backends. The
function is pretty safe for anybody to call, though I admit that it is
not very general.

I don't have a strong feeling either way. If nobody else chimes in, I'll
remove it from the header file as you suggested. We can always add it
back if somebody needs it.

> Also, can you explain why the merge iterator doesn't also perform the
> optimization/convenience of checking if one iterator is an empty
> iterator?

That's because the merge iterator doesn't know what its select function
will do. For example, you could imagine an "intersect" select function
that only lets through references that were in *both* sub-iterators. In
that case, your suggested "optimization" would be incorrect.

Incidentally, that's also why I decided to leave the select function in
charge even after one or both of the sub-iterators is exhausted—because
it lets merge_ref_iterator implement more diverse behavior.

>> +/*
>> + * Iterate over the intries from iter0 and iter1, with the values
> 
> s/intries/entries/

Thanks.

>> + * interleaved as directed by the select function. The iterator takes
>> + * ownership of iter0 and iter1 and frees them when the iteration is
>> + * over.
>> + */
>> +struct ref_iterator *merge_ref_iterator_begin(
>> +               struct ref_iterator *iter0, struct ref_iterator *iter1,
>> +               ref_iterator_select_fn *select, void *cb_data);
>> +
>> +/*
>> + * An iterator consisting of the union of the entries from iter0 and
>> + * iter1. If there are entries common to the two sub-iterators, use
>> + * the one from iter1. Each iterator must iterate over its entries in
>> + * strcmp() order by refname for this to work.
>> + *
>> + * The new iterator takes ownership of its arguments and frees them
>> + * when the iteration is over. As a convenience to callers, if iter0
>> + * or iter1 is_empty_ref_iterator(), then abort that one immediately
>> + * and return the other iterator directly, without wrapping it.
>> + */
>> +struct ref_iterator *overlay_ref_iterator_begin(struct ref_iterator *iter0,
>> +                                               struct ref_iterator *iter1);
> 
> When reading about the overlay iterator (both code and documentation),
> my expectation was that iter0 would shadow iter1, not the other way
> around as implemented here. Of course, that's entirely subjective, but
> the generic names don't provide any useful clues as to which shadows
> which. Perhaps giving them more meaningful names would help.

That's a good idea. I also found myself having to refer back to the
documentation to remind myself which was which.

How about I rename them "back" and "front"? I will also reverse the
order of the arguments.

(But I will leave the names "iter0" and "iter1" in merge_ref_iterator,
and also the constants like ITER_SELECT_0, because these don't
necessarily have the interpretation of "back" and "front".)

>> +/*
>> + * Wrap iter0, only letting through the references whose names start
>> + * with prefix. If trim is set, set iter->refname to the name of the
>> + * reference with that many characters trimmed off the front;
>> + * otherwise set it to the full refname. The new iterator takes over
>> + * ownership of iter0 and frees it when iteration is over. It makes
>> + * its own copy of prefix.
>> + *
>> + * As an convenience to callers, if prefix is the empty string and
>> + * trim is zero, this function returns iter0 directly, without
>> + * wrapping it.
>> + */
>> +struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
>> +                                              const char *prefix,
>> +                                              int trim);
> 
> Minor: Similarly, when reading the code and documentation, I wondered
> why this was named 'iter0' when no 'iter1' was in sight. Perhaps name
> it simply 'iter'.

I found that it got a little bit confusing, because the constructor and
method implementations all use `iter` as a local variable. In particular
in the constructor there would want to be an argument "iter" and also
the local variable "iter" for the iterator being constructed, so a new
name would otherwise have to be invented for one or the other. Between
all the "iter" and "iter" and "iter->iter", I found that naming the
sub-iterator "iter0" made things a little bit less bewildering.

If you don't like that, we could name the embedded iterators something
like "subiter", "subiter0", and "subiter1". But the current convention
is a bit more succinct so I slightly prefer it.

Thanks for all your comments!

Michael

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html