Re: RFC: New reference iteration paradigm

David Turner <dturner@xxxxxxxxxxxxxxxx> · Thu, 31 Mar 2016 16:15:37 -0400



On Thu, 2016-03-31 at 18:13 +0200, Michael Haggerty wrote:
> Currently the way to iterate over references is via a family of
> for_each_ref()-style functions. You pass some arguments plus a
> callback
> function and cb_data to the function, and your callback is called for
> each reference that is selected.
> 
> This works, but it has two big disadvantages:
> 
> 1. It is cumbersome for callers. The caller's logic has to be split
>    into two functions, the one that calls for_each_ref() and the
>    callback function. Any data that have to be passed between the
>    functions has to be stuck in a separate data structure.
> 
> 2. This interface is not composable. For example, you can't write a
>    single function that iterates over references from two sources,
>    as is interesting for combining packed plus loose references,
>    shared plus worktree-specific references, symbolic plus normal
>    references, etc. The current code for combining packed and loose
>    references needs to walk the two reference trees in lockstep,
>    using intimate knowledge about how references are stored [1,2,3].
> 
> I'm currently working on a patch series to transition the reference
> code
> from using for_each_ref()-style iteration to using proper iterators.
> 
> The main point of this change is to change the base iteration
> paradigm
> that has to be supported by reference backends. So instead of
> 
> > int do_for_each_ref_fn(const char *submodule, const char *base,
> >                        each_ref_fn fn, int trim, int flags,
> >                        void *cb_data);
> 
> the backend now has to implement
> 
> > struct ref_iterator *ref_iterator_begin_fn(const char *submodule,
> >                                            const char *prefix,
> >                                            unsigned int flags);
> 
> The ref_iterator itself has to implement two main methods:
> 
> > int iterator_advance_fn(struct ref_iterator *ref_iterator);
> > void iterator_free_fn(struct ref_iterator *ref_iterator);
> 
> A loop over references now looks something like
> 
> > struct ref_iterator *iter = each_ref_in_iterator("refs/tags/");
> > while (ref_iterator_advance(iter)) {
> >         /* code using iter->refname, iter->oid, iter->flags */
> > }
> 
> I built quite a bit of ref_iterator infrastructure to make it easy to
> plug things together quite flexibly. For example, there is an
> overlay_ref_iterator which takes two other iterators (e.g., one for
> packed and one for loose refs) and overlays them, presenting the
> result
> via the same iterator interface. But the same overlay_ref_iterator
> can
> be used to overlay any two other iterators on top of each other.

I haven't looked at the code yet, but this makes sense to me.  In
general, the major reason to supply a callback style of API is when
iteration is more complicated than whatever will be consuming the data
(I can't remember where I heard this argument, but I found it pretty
convincing).  It seems like this is increasingly not the case, so we
should move towards the iterator style.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html