Patrick Steinhardt <ps@xxxxxx> writes: > On Fri, Sep 13, 2024 at 07:47:06AM -0500, karthik nayak wrote: >> Patrick Steinhardt <ps@xxxxxx> writes: >> > +/* >> > + * Handle exclude patterns. Returns either `1`, which tells the caller that the >> > + * current reference shall not be shown. Or `0`, which indicates that it should >> > + * be shown. >> > + */ >> > +static int should_exclude_current_ref(struct reftable_ref_iterator *iter) >> > +{ >> > + while (iter->exclude_patterns[iter->exclude_patterns_index]) { >> > + const char *pattern = iter->exclude_patterns[iter->exclude_patterns_index]; >> > + char *ref_after_pattern; >> > + int cmp; >> > + >> > + /* >> > + * Lazily cache the pattern length so that we don't have to >> > + * recompute it every time this function is called. >> > + */ >> > + if (!iter->exclude_patterns_strlen) >> > + iter->exclude_patterns_strlen = strlen(pattern); >> > + >> > + /* >> > + * When the reference name is lexicographically bigger than the >> > + * current exclude pattern we know that it won't ever match any >> > + * of the following references, either. We thus advance to the >> > + * next pattern and re-check whether it matches. >> >> So this means that the exclude patterns were lexicographically sorted. >> Otherwise this would work. > > Indeed. Good that you call out my assumption, as I in fact didn't verify > that it holds, and in fact it doesn't. It's not a correctness issue if > it doesn't hold, because it would simply mean that we don't skip over > some references where we really could. But it certainly is a perfomance > issue. > > Will fix and add a test for it. > >> > + * Note that the seeked-to reference may also be excluded. This >> > + * is not handled here though, but the caller is expected to >> > + * loop and re-verify the next reference for us. >> > + */ >> >> The seeked-to reference here being the one with 0xff. We could get rid >> of this by doing something like this: >> >> int last_char_idx = iter->exclude_patterns_strlen - 1 >> ref_after_pattern = xstrfmt("%s", pattern); >> ref_after_pattern[last_char_idx] = ref_after_pattern[last_char_idx] + 1; >> >> instead no? > > Sorry, I don't quite follow what you mean with "get rid of this". What > exactly is "this"? Do you mean the re-looping? > > If so then the above doesn't fix it, no. We'd have to repeat a whole lot > of code here to also retrieve the next entry, store it into `iter->ref`, > check whether it is an actual ref starting with "refs/" and so on. > Looping once very much feels like the better thing to do. > I definitely responded to the wrong chunk of your comment. I meant to respond to > * This is done by appending the highest possible character to > * the pattern. Consequently, all references that have the > * pattern as prefix and whose suffix starts with anything in > * the range [0x00, 0xfe] are skipped. And given that 0xff is a > * non-printable character that shouldn't ever be in a ref name, > * we'd not yield any such record, either. My point being, we can even skip "pattern + 0xff" by just setting the seek to ref to be the next character in the pattern. But since "pattern + 0xff" is anyways not an expected ref. They should behave the same. >> > @@ -580,9 +660,45 @@ static struct ref_iterator_vtable reftable_ref_iterator_vtable = { >> > .abort = reftable_ref_iterator_abort >> > }; >> > >> > +static char **filter_exclude_patterns(const char **exclude_patterns) >> > +{ >> > + size_t filtered_size = 0, filtered_alloc = 0; >> > + char **filtered = NULL; >> > + >> > + if (!exclude_patterns) >> > + return NULL; >> > + >> > + for (size_t i = 0; ; i++) { >> > + const char *exclude_pattern = exclude_patterns[i]; >> > + int has_glob = 0; >> > + >> > + if (!exclude_pattern) >> > + break; >> > + >> > + for (const char *p = exclude_pattern; *p; p++) { >> > + has_glob = is_glob_special(*p); >> > + if (has_glob) >> > + break; >> > + } >> >> Why do we need to filter excludes here? Don't the callee's already do >> something like this? > > No, it doesn't. The code for exclude patterns is structured in such a > way that the responsibility is with the backend to decide what it can > and cannot filter. In theory there could be a backend that can exclude > refs based on globs efficiently, even though neither the "files" nor the > "reftable" backend can. Thanks for the explanataion.
Attachment:
signature.asc
Description: PGP signature