On Sun, Mar 21 2021, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes: > >> A re-roll of v3 of this series[1] and on top of (and requires) my >> just-submitted v5 re-roll of the read_tree() refactoring series[2]. >> >> There was a regression in 1/32 of the old series. Removing the >> canon_mode() call in diff.c didn't account for us needing to >> canonicalize "diff --no-index" modes. There were no tests for this, >> and it failed or not depending on the FS modes in the git.git checkout >> being tested. This fixes the CI smoke coming from this series. > > Sorry, but quite honestly, I am not quite sure what value this > entire code churn is trying to add to the codebase. > > The function signature of read_tree_fn_t callback function gets > changed from the mode bits (which is capable to express differences > between regular files, executable files and symbolic links) to "enum > object_type" (which can only say "this is a blob"), which is a > regression, no? > > A callback can no longer do things like this, for example: > > static int add_path_to_index(const struct object_id *oid, > struct strbuf *base, const char *path, > unsigned int mode, void *context) > { > struct index_state *istate = (struct index_state *)context; > struct cache_entry *ce; > size_t len = base->len; > > if (S_ISDIR(mode)) > return READ_TREE_RECURSIVE; > > strbuf_addstr(base, path); > > ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0); > ce->ce_flags |= CE_SKIP_WORKTREE; > set_index_entry(istate, istate->cache_nr++, ce); > > strbuf_setlen(base, len); > return 0; > } > > where executableness or symlinkshood is lost. Yes, that would be a serious regression. I agree that all these functions/callbacks etc. should have a way to get at the mode bits. I'm adding "enum object_type", not removing the "mode" parameter in read_tree_fn_t. This function (which is in "seen" as 03316f20347 (sparse-index: implement ensure_full_index(), 2021-03-16)) works just fine in combination with this series. The other APIs modified here all retain the ability to give you the mode bit, they (the tree-walk.h changes) just optionally give you the option of getting just the type (or just the path), and as it turns out most users of the API can be converted over to that. > This probably is the third time I caught similar "let's lose > information passed through the call chain as nobody seems to need > it" mistakes in the iterations of this series, and that is two times > too many. We should learn from our earlier mistakes---tweaking of > the API that happens to be still OK with the current codebase can be > either a needless churn that loses useful expressiveness from the > API, or a useful clean-up to kill dead parameter or excess > flexibility. > > And these three incidents we have seen so far are the former. The current codebase will allow you to stick arbitrary mode bits in trees, we have an fsck check to prevent that which doesn't work. I had a summary of this in v1, but should probably have provided a recap[1]. This series is an admittedly long journey towards fixing that. I've got unsubmitted patchen on top that make that fsck check work again. I think the root cause of these bugs and other ones I've found along the way (some of which I'm not quite comfortable discussing the details of on the public list yet) is that the tree walking API is unnecessarily low-level for most callers. Most of those callers don't care about the details of the mode bits, but are just traversing a tree and doing something with one the object types. As opposed to having a mode, but do you want a mode as-is from a tree, normalized to canon_mode() (for writing?) etc. I think being able to clearly tell apart those callers from the simpler ones is a clear win. So I'm hoping you'll bear with me & this series, sorry about the breakages so far, in my slight defense they were all subtle testing blind spots (but we now have tests!). 1. https://lore.kernel.org/git/20210308150650.18626-1-avarab@xxxxxxxxx/