On Tue, Jun 13, 2023 at 05:27:05PM -0700, Junio C Hamano wrote: > > - Clean up the results from the previous step: discard empty regions, > > and combine adjacent regions. > > Say something like > > The resulting list of regions that would never contain refs that > are not excluded is called the "jump list". > > here, probably. Otherwise the first reference to "jump list" we see > later feels a bit too abrupt. Good suggestion. I phrased it slightly differently in the version that I'll send shortly, but the spirit is the same. > > There are a few other gotchas worth considering. First, note that the > > jump list is sorted, so once we jump past a region, we can avoid > > considering it (or any regions preceding it) again. The member > > `jump_pos` is used to track the first next-possible region to jump > > through. > > > > Second, note that the exclusion list is best-effort, since we do not > > handle loose references, and because of the meta-character issue above. > > I found this a bit misleading; a natural reading of "exclusion list" > is that the phrase refers to the list of exclude patterns given from > the command line, and users would be upset if the processing of it > is best effort. > > I think what you meant to say was that optimization to avoid full > scan using the jump list does not aim for perfection, and entries > that are not skipped using the jump list may still be excluded by > the exclude patterns. Yep, exactly. I think this was from an earlier version from before this was called "jump lists", and I missed it when find-and-replacing through the patch messages. Thanks for spotting. > > In repositories with a large number of hidden references, the speed-up > > "hidden" -> "excluded". Your final objective to implement the > feature to exclude refs using patterns and optimize it using the > jump list data may be to implement "hidden references", but that > hasn't be talked about in the above. All we have been hearing was > that we are optimizing the walk over packed-refs using exclude > patterns. Yep, thanks. > > +static const char *ptr_max(const char *x, const char *y) > > +{ > > + if (x > y) > > + return x; > > + return y; > > +} > > Hopefully the compiler would inline the function without being told. > > These pointers point into the same mmapped region of contiguous > memory that holds the contents of the packed-refs file, so > comparison between them is always defined. Good. > > I wondered if > > return (x > y) ? x : y; > > is easier to read, simply because it treats both cases more equally > (in other words, as written, (x>y) appears more "special"), but that > is minor. Yeah, I think that any reasonable compiler would almost certainly inline this, especially at higher optimization levels. But I agree with your suggestion nonetheless, thanks. > > + /* > > + * We can't feed any excludes with globs in them to the > > + * refs machinery. It only understands prefix matching. > > + * We likewise can't even feed the string leading up to > > + * the first meta-character, as something like "foo[a]" > > + * should not exclude "foobar" (but the prefix "foo" > > + * would match that and mark it for exclusion). > > + */ > > + if (has_glob_special(*pattern)) > > + continue; > > OK. When we have a "literal" exclude pattern and another "glob" > exclude pattern, we can afford to ignore the "glob" one when > building the jump list and include only the "literal" one, because > the jump list is used only to skip over entries that obviously can > never be in the result, *and* --exclude are additive (i.e. being > on jump list because of the "literal" pattern is a reason enough to > be excluded from the result; matching or not matching the other > patterns does not affect the fate of the ref that got excluded due > to matching the "literal" pattern). > > Makes sense. Exactly. Thanks, Taylor