Hi Stolee, On Tue, 17 Aug 2021, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > > The iterated search in find_cache_entry() was recently modified to > include a loop that searches backwards for a sparse directory entry that > matches the given traverse_info and name_entry. However, the string > comparison failed to actually concatenate those two strings, so this > failed to find a sparse directory when it was not a top-level directory. > > This caused some errors in rare cases where a 'git checkout' spanned a > diff that modified files within the sparse directory entry, but we could > not correctly find the entry. Good explanation. I wonder a bit about the performance impact. How "hot" is this function? I.e. how often is it called, on average? I ask because I see opportunities to optimize in both directions: it could be written more concisely (if speed does not matter as much), and it could be made faster (if speed matters a lot). See below for more. > > Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > --- > unpack-trees.c | 18 +++++++++++++----- > 1 file changed, 13 insertions(+), 5 deletions(-) > > diff --git a/unpack-trees.c b/unpack-trees.c > index 5786645f315..df1f4437723 100644 > --- a/unpack-trees.c > +++ b/unpack-trees.c > @@ -1255,9 +1255,10 @@ static int sparse_dir_matches_path(const struct cache_entry *ce, > static struct cache_entry *find_cache_entry(struct traverse_info *info, > const struct name_entry *p) > { > - struct cache_entry *ce; > + struct cache_entry *ce = NULL; Makes sense: since you need to release the allocated memory, you can no longer `return NULL` early, but have to break out of the loop and return `ce`. > int pos = find_cache_pos(info, p->path, p->pathlen); > struct unpack_trees_options *o = info->data; > + struct strbuf full_path = STRBUF_INIT; > > if (0 <= pos) > return o->src_index->cache[pos]; > @@ -1273,6 +1274,10 @@ static struct cache_entry *find_cache_entry(struct traverse_info *info, > if (pos < 0 || pos >= o->src_index->cache_nr) > return NULL; > > + strbuf_addstr(&full_path, info->traverse_path); > + strbuf_add(&full_path, p->path, p->pathlen); > + strbuf_addch(&full_path, '/'); This could be reduced to: strbuf_addf(&full_path, "%s%.*s/", info->traverse_path, (int)p->pathlen, p->path); But if speed matters, we probably need something more like this: size_t full_path_len; const char *full_path; char *full_path_1 = NULL; if (!*info->traverse_path) { full_path = p->path; full_path_len = p->pathlen; } else { size_t len = strlen(info->traverse_path); full_path_len = len + p->pathlen + 1; full_path = full_path_1 = xmalloc(full_path_len + 1); memcpy(full_path_1, info->traverse_path, len); memcpy(full_path_1 + len, p->path, p->pathlen); full_path_1[full_path_len - 1] = '/'; full_path_1[full_path_len] = '\0'; } [...] free(full_path_1); It would obviously be much nicer if we did not have to go for that ugly long version... > + > /* > * Due to lexicographic sorting and sparse directory > * entries ending with a trailing slash, our path as a > @@ -1283,17 +1288,20 @@ static struct cache_entry *find_cache_entry(struct traverse_info *info, > while (pos >= 0) { > ce = o->src_index->cache[pos]; > > - if (strncmp(ce->name, p->path, p->pathlen)) > - return NULL; > + if (strncmp(ce->name, full_path.buf, full_path.len)) { > + ce = NULL; > + break; > + } > > if (S_ISSPARSEDIR(ce->ce_mode) && > sparse_dir_matches_path(ce, info, p)) > - return ce; > + break; > > pos--; > } > > - return NULL; > + strbuf_release(&full_path); > + return ce; > } > > static void debug_path(struct traverse_info *info) > -- > gitgitgadget > >