On Fri, Sep 13, 2024 at 07:47:06AM -0500, karthik nayak wrote: > Patrick Steinhardt <ps@xxxxxx> writes: > > +/* > > + * Handle exclude patterns. Returns either `1`, which tells the caller that the > > + * current reference shall not be shown. Or `0`, which indicates that it should > > + * be shown. > > + */ > > +static int should_exclude_current_ref(struct reftable_ref_iterator *iter) > > +{ > > + while (iter->exclude_patterns[iter->exclude_patterns_index]) { > > + const char *pattern = iter->exclude_patterns[iter->exclude_patterns_index]; > > + char *ref_after_pattern; > > + int cmp; > > + > > + /* > > + * Lazily cache the pattern length so that we don't have to > > + * recompute it every time this function is called. > > + */ > > + if (!iter->exclude_patterns_strlen) > > + iter->exclude_patterns_strlen = strlen(pattern); > > + > > + /* > > + * When the reference name is lexicographically bigger than the > > + * current exclude pattern we know that it won't ever match any > > + * of the following references, either. We thus advance to the > > + * next pattern and re-check whether it matches. > > So this means that the exclude patterns were lexicographically sorted. > Otherwise this would work. Indeed. Good that you call out my assumption, as I in fact didn't verify that it holds, and in fact it doesn't. It's not a correctness issue if it doesn't hold, because it would simply mean that we don't skip over some references where we really could. But it certainly is a perfomance issue. Will fix and add a test for it. > > + * Note that the seeked-to reference may also be excluded. This > > + * is not handled here though, but the caller is expected to > > + * loop and re-verify the next reference for us. > > + */ > > The seeked-to reference here being the one with 0xff. We could get rid > of this by doing something like this: > > int last_char_idx = iter->exclude_patterns_strlen - 1 > ref_after_pattern = xstrfmt("%s", pattern); > ref_after_pattern[last_char_idx] = ref_after_pattern[last_char_idx] + 1; > > instead no? Sorry, I don't quite follow what you mean with "get rid of this". What exactly is "this"? Do you mean the re-looping? If so then the above doesn't fix it, no. We'd have to repeat a whole lot of code here to also retrieve the next entry, store it into `iter->ref`, check whether it is an actual ref starting with "refs/" and so on. Looping once very much feels like the better thing to do. > > @@ -580,9 +660,45 @@ static struct ref_iterator_vtable reftable_ref_iterator_vtable = { > > .abort = reftable_ref_iterator_abort > > }; > > > > +static char **filter_exclude_patterns(const char **exclude_patterns) > > +{ > > + size_t filtered_size = 0, filtered_alloc = 0; > > + char **filtered = NULL; > > + > > + if (!exclude_patterns) > > + return NULL; > > + > > + for (size_t i = 0; ; i++) { > > + const char *exclude_pattern = exclude_patterns[i]; > > + int has_glob = 0; > > + > > + if (!exclude_pattern) > > + break; > > + > > + for (const char *p = exclude_pattern; *p; p++) { > > + has_glob = is_glob_special(*p); > > + if (has_glob) > > + break; > > + } > > Why do we need to filter excludes here? Don't the callee's already do > something like this? No, it doesn't. The code for exclude patterns is structured in such a way that the responsibility is with the backend to decide what it can and cannot filter. In theory there could be a backend that can exclude refs based on globs efficiently, even though neither the "files" nor the "reftable" backend can. Patrick