On 2/1/2021 3:50 PM, Elijah Newren wrote: > On Mon, Jan 25, 2021 at 9:42 AM Derrick Stolee via GitGitGadget > <gitgitgadget@xxxxxxxxx> wrote: >> It is important to be careful about the trailing directory separator >> that exists in the sparse directory entries but not in the subtree >> paths. > > The comment makes me wonder if leaving the trailing directory > separator out would be better, as it'd allow direct comparisons. Of > course, you have a better idea of what is easier or harder based on > this decision. Is there any chance you have a quick list of the > places that the code was simplified by this decision and a list of > places like this one that were made slightly harder? I'm going through all of your comments and making notes about areas to fix and clean up before starting a new series for full review. This question of the trailing slash is important, and I will take particular care about answering it as I rework the series. However, the questions in this patch poke at the right places... >> + /* remove directory separator if a sparse directory entry */ >> + if (S_ISSPARSEDIR(ce)) >> + ce_len--; > > Here's where your comment about trailing separator comes in; makes sense. > >> return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode); >> } >> >> @@ -989,6 +999,10 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf >> if (cmp) >> return cmp; >> >> + /* If ce is a sparse directory, then allow equality here. */ >> + if (S_ISSPARSEDIR(ce)) >> + return 0; >> + > > This seems surprising to me. Is there a chance you are comparing > sparse directory A with sparse directory B and you return with > equality? Or sparse_directory A with regular file B? Do the callers > still do the right thing? If your code change here is right, it seems > like it deserves an extra comment either in the code or the commit > message. Sometimes a caller is asking for the first index entry corresponding to a directory. In these cases, the input could be "A/B/C/". We want to ensure that a sparse directory entry corresponding exactly to that directory is correctly matched. If we place "A/B/C" in the index instead, this search becomes more complicated (I think; I will justify this more after thinking about it). At this point in time, we are just saying "We found the entry with equal path value!" and not failing with the check in the rest of the method: /* * Even if the beginning compared identically, the ce should * compare as bigger than a directory leading up to it! */ return ce_namelen(ce) > traverse_path_len(info, tree_entry_len(n)); >> - if (traverse_trees_recursive(n, dirmask, mask & ~dirmask, >> + if (recurse && >> + traverse_trees_recursive(n, dirmask, mask & ~dirmask, >> names, info) < 0) >> return -1; >> return mask; > > The unpack_callback() code has some comparison to a cache-tree, but > I'd assume that you'd need to update cache-tree.c somewhat to take > advantage of these sparse directory entries. Am I wrong, and you just > get cache-tree.c working with sparse directory entries for free? Or > is this something coming in a later patch? In the RFC, I integrate the cache-tree with the sparse-index at the very end. I will move that integration to be much earlier in the next submission, so it becomes part of the format discussion. Thanks, -Stolee