On 1/26/2021 10:05 PM, Elijah Newren wrote: > On Mon, Jan 25, 2021 at 9:42 AM Derrick Stolee via GitGitGadget > <gitgitgadget@xxxxxxxxx> wrote: ... >> Sparse directory entries have a specific 'ce_mode' value. The macro >> S_ISSPARSEDIR(ce) can check if a cache_entry 'ce' has this type. This >> ce_mode is not possible with the existing index formats, so we don't >> also verify all properties of a sparse-directory entry, which are: >> >> 1. ce->ce_mode == 01000755 > > This is a weird number. What's the reason for choosing it? It looks > deceptively close to 0100755, normal executable files, but has the > extra 0, meaning that ce->ce_mode & S_IFMT is 0, suggesting it has no > file type. > > Since it's a directory, why not use S_IFDIR (040000)? > > (GITLINK does use the weird 0160000 value, but it happens to be > S_IFLNK | S_IFDIR == 0120000 | 040000, which conveys "it's both a > directory and a symlink") I forget how exactly I came up with these magic constants, but then completely forgot to think of them critically because I haven't had to look at them in a while. They _are_ important, especially because these values affect the file format itself. I'll think harder on this before submitting a series intended for merging. >> 2. ce->flags & CE_SKIP_WORKTREE is true > > Makes sense. > >> 3. ce->name[ce->namelen - 1] == '/' (ends in dir separator) > > Is there a particular reason for this? I'm used to seeing names > without the trailing slash, both in the index and in tree objects. I > don't know enough to be for or against this idea; just curious at this > point. It's yet another way to distinguish directories from files, but there are cases where we do string searches up to a prefix, and having these directory separators did help, IIRC. >> 4. ce->oid references a tree object. > > Makes sense...but doesn't that suggest we'd want to use ce->ce_mode = 040000? ... >> +#define CE_MODE_SPARSE_DIRECTORY 01000755 >> +#define SPARSE_DIR_MODE 0100 > > Another magic value. Feels like the commit message should reference > this one and why it was picked. Seems odd to me, and possibly > problematic to re-use file permission bits that might collide with > files recorded by really old versions of git. Maybe that's not a > concern, though. > >> +#define S_ISSPARSEDIR(m) ((m)->ce_mode == CE_MODE_SPARSE_DIRECTORY) > > Should the special sauce apply to ce_flags rather than ce_mode? Thus, > instead of an S_ISSPARSEDIR, perhaps have a ce_sparse_dir macro > (similar to ce_skip_worktree) based on a CE_SPARSE_DIR value (similar > to CE_SKIP_WORKTREE)? > > Or, alternatively, do we need a single special state here? Could we > check for a combination of ce_mode == 040000 && ce_skip_worktree(ce)? The intention was that ce_mode be a unique value that could only be assigned to a directory entry, which would then by necessity be sparse. Checking both ce_mode and ce_flags seemed wasteful with the given assumptions ... >> + /* Copy back into original index. */ >> + memcpy(&istate->name_hash, &full->name_hash, sizeof(full->name_hash)); >> + istate->sparse_index = 0; >> + istate->cache = full->cache; > > Haven't you leaked the original istate->cache here? Yes, seems so. Will fix. Thanks, -Stolee