Jeff King <peff@xxxxxxxx> writes: > On Thu, Mar 31, 2016 at 11:01:44AM -0700, Junio C Hamano wrote: >> Michael Haggerty <mhagger@xxxxxxxxxxxx> writes: >> >>> the backend now has to implement >>> >>>> struct ref_iterator *ref_iterator_begin_fn(const char *submodule, >>>> const char *prefix, >>>> unsigned int flags); The problem with callbacks, including their lack of composability, is often solved in high-level languages by using Promises / Futures (or Channels / Supplies). But I think in this case implementing a straightforward Iterator pattern would be a better and simpler solution, especially in C. >>> The ref_iterator itself has to implement two main methods: >>> >>>> int iterator_advance_fn(struct ref_iterator *ref_iterator); >>>> void iterator_free_fn(struct ref_iterator *ref_iterator); >>> >>> A loop over references now looks something like >>> >>>> struct ref_iterator *iter = each_ref_in_iterator("refs/tags/"); >>>> while (ref_iterator_advance(iter)) { >>>> /* code using iter->refname, iter->oid, iter->flags */ >>>> } >> >> We'd want to take advantage of the tree-like organization of the >> refs (i.e. refs/tags/a and refs/tags/b sit next to each other and >> they are closer to each other than they are to refs/heads/a) so that >> a request "I want to iterate only over tags, even though I may have >> millions of other kinds of refs" can be done with cost that is >> proportional to how many tags you have. >> >> The current implementation of for_each_tag_ref() that goes down to >> do_for_each_entry() in files-backend.c has that propertly, and the >> new iteration mechanism with the above design seems to keep it, >> which is very nice. > > Actually, that is a slight fiction. :) > > We traverse only the loose ref directories we need, but we populate the > entire packed-refs tree in one go. That's fine if you have a reasonable > number of refs and care about the cold-cache case (where you care a lot > about hitting directories you don't need to, but reading the entire > packed-refs file isn't a big deal). But it's really bad if you have an > 800MB packed-refs file, as looking up one tiny subset of the entries > wastes a lot of RAM and CPU pulling that into our internal > representation[1]. > > At one point I wrote a patch to binary search the packed-refs file, find > the first "refs/tags/" entry, and then walk linearly through there. What > stopped me is that the current refs.c code (I guess file-backend.c these > days) was not happy with me partially filling in the ref_dir structs in > this "inside out" way. [...] Isn't this what reftable - an alternative way of storing refs in Git, tested by being used by JGit - would solve? See Christian Couder post "Implementing reftable in Git" https://public-inbox.org/git/CAP8UFD0PPZSjBnxCA7ez91vBuatcHKQ+JUWvTD1iHcXzPBjPBg@xxxxxxxxxxxxxx/t/#u 'Efficient lookup of an entire namespace, such as refs/tags/' is allegedly one of the objectives of this format. Regards, -- Jakub Narębski