On 5/20/2022 2:17 PM, Junio C Hamano wrote:> "Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > >> @@ -231,18 +236,41 @@ static int add_path_to_index(const struct object_id *oid, >> struct strbuf *base, const char *path, >> unsigned int mode, void *context) >> { > > OK, this function is a callback of read_tree_at(), whose caller is > walking the source (=current) index, and while it is copying > existing non-directory entries to the destination (=ctx->write) > index, it found a tree in the source in-core index that does not > match the cone pattern (or we are expanding fully). Our job is to > expand the given tree as a "subdirectory" in the project into the > destination index. > > We used to just let the calling read_tree_at() fully and recursively > walk the tree while adding any non-tree entries to the destination. > Now, we have the pattern list in the context, we go more selective. > > The design makes sense. > >> + struct modify_index_context *ctx = (struct modify_index_context *)context; >> struct cache_entry *ce; >> size_t len = base->len; >> >> + if (S_ISDIR(mode)) { >> + int dtype; >> + size_t baselen = base->len; >> + if (!ctx->pl) >> + return READ_TREE_RECURSIVE; > > Fully recursive case. We can do without any match, just like the > caller (i.e. expand_to_pattern_list) did, when the pattern list is > NULL. > >> + /* >> + * Have we expanded to a point outside of the sparse-checkout? >> + */ >> + strbuf_addstr(base, path); >> + strbuf_add(base, "/-", 2); >> + >> + if (path_matches_pattern_list(base->buf, base->len, >> + NULL, &dtype, >> + ctx->pl, ctx->write)) { > > If that sample path in the directory matches (MATCHED or > MATCHED_RECURSIVE) the patterns, we recurse into the tree to expand. > > Can the function return UNDECIDED here? If so what should happen? > As written, the code will behave as if it matched, and it is not > quite clear if that is the behaviour intended by this patch or an > accident. UNDECIDED can only be returned when not in cone mode. We are in cone mode when this helper is used. >> + strbuf_setlen(base, baselen); >> + return READ_TREE_RECURSIVE; >> + } > > The caller found a tree at path <path> in the index. We check if > our patterns match a fictitious path <path> + "/-", which may exist > if the <path> is a directory and if there is a funny named file or > directory "-" in it. > > Why "-"? Are we trying ot see if "ctx->pl" matches "anything" in > the directory that is at <path>? Is the assumption here that pl > only names directories literally without blobs (I presume that is > the same thing as assuming the cone mode)? > > I am trying to see if there is a way that expresses the intention of > this code more clearly than using an arbitrary path "-" (and trying > to figure out if I got the intention right in the first place ;-).. The '+ "/-"' is there to give us an example file path for "<path>" as a directory. This is specifically because path_matches_pattern_list() checks the parent directories for matches, expecting the input to be a file name. This could use a comment. I'm thinking... /* * Have we expanded to a point outside of the sparse-checkout? * * Artificially pad the path name with a slash "/" to * indicate it as a directory, and add an arbitrary file * name ("-") so we can consider base->buf as a file name * to match against the cone-mode patterns. * * If we compared just "path", then we would expand more * than we should. Since every file at root is always * included, we would expand every directory at root at * least one level deep instead of using sparse directory * entries. */ >> + /* >> + * The path "{base}{path}/" is a sparse directory. Create the correct >> + * name for inserting the entry into the index. >> + */ >> + strbuf_setlen(base, base->len - 1); > > This removes that phoney "-" while keeping the trailing "/". Just > like "-" was unclear, understanding this "- 1" requires that the > reader understands why "/-" was used earlier. > > The resulting "base" is used in the newly created cache entry to > represent the (unexpanded) directory below, and such a cache entry > is supposed to have a path with a trailing slash, so it makes sense. > >> + } else { >> + strbuf_addstr(base, path); > > For any non-tree thing, we use the given path for the cache entry to > represent it. > >> + } >> + >> + ce = make_cache_entry(ctx->write, mode, oid, base->buf, 0, 0); >> ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED; >> - set_index_entry(istate, istate->cache_nr++, ce); >> + set_index_entry(ctx->write, ctx->write->cache_nr++, ce); > > And the cache entry (newly created) is added to the destination. > Unlike what happens in the caller (i.e. expand_to_pattern_list) that > moves the cache entry taken from the source index to the destination > index, the caller will discard the cache entry taken from the source > index after read_tree_at() returns, and we create a new instance for > ourselves here, even if we _could_ have reused it (by passing it in > the context structure, for example). > >> strbuf_setlen(base, len); > > And we restore the length of the path in the base to the original > before we return. > >> return 0; > > Returning 0 as opposed to READ_TREE_RECURSIVE here means "we've > dealt with the entry; don't recurse into subtree even if you called > us with a tree", which is the right thing to do here, as we did have > done all we need to here. > > OK, except for that "/-" thing, all of the above makes sense to me. Thanks for your careful read. Hopefully my explanation of the "/-" thing helps. Thanks, -Stolee