On Thu, Sep 7, 2023 at 11:07 AM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > Josip Sokcevic <sokcevic@xxxxxxxxxx> writes: > > > diff --git a/diff-lib.c b/diff-lib.c > > index d8aa777a73..664613bb1b 100644 > > --- a/diff-lib.c > > +++ b/diff-lib.c > > @@ -39,11 +39,22 @@ > > static int check_removed(const struct index_state *istate, const struct cache_entry *ce, struct stat *st) > > { > > assert(is_fsmonitor_refreshed(istate)); > > Not a problem this patch introduces, but doesn't this call path > > diff_cache() > -> unpack_trees() > -> oneway_diff() > -> do_oneway_diff() > -> show_new_file(), show_modified() > -> get_stat_data() > -> check_removed() > > violate the assertion? If so, perhaps we should rewrite it into a > more explicit "if (...) BUG(...)" that is not compiled away. True, I will update it. > > > - if (!(ce->ce_flags & CE_FSMONITOR_VALID) && lstat(ce->name, st) < 0) { > > - if (!is_missing_file_error(errno)) > > - return -1; > > - return 1; > > + if (ce->ce_flags & CE_FSMONITOR_VALID) { > > + /* > > + * Both check_removed() and its callers expect lstat() to have > > + * happened and, in particular, the st_mode field to be set. > > + * Simulate this with the contents of ce. > > + */ > > + memset(st, 0, sizeof(*st)); > > It is true that the original, when CE_FSMONITOR_VALID bit is set, > bypasses lstat() altogether and leaves the contents of st completely > uninitialized, but this is still way too insufficient, isn't it? > > There are three call sites of the check_removed() function. > > * The first one in run_diff_files() only cares about st.st_mode and > other members of the structure are not looked at. This makes > readers wonder if the "st" parameter to check_removed() should > become "mode_t *st_mode" to clarify this point, but the primary > thing I want to say is that this caller will not mind if we leave > other members of st bogus (like 0-bit filled) as long as the mode > is set correctly. > > * The second one in run_diff_files() passes the resulting &st to > match_stat_with_submodule(), which in turn passes it to > ie_match_stat(), which cares about "struct stat" members that are > used for quick change detection, like owner, group, mtime. > Giving it a bogus st will most likely cause it to report a > change. > > * The third one is in get_stat_data(). This also uses the &st to > call match_stat_with_submodule(), so it is still totally broken > to give it a bogus st, the same way as the second caller above. > > > + st->st_mode = ce->ce_mode; > > Does this work correctly when the cache entry points at a gitlink, > which uses 0160000 that is not a valid st_mode? I think you'd want > to use a reverse function of create_ce_mode(). I realized this too but after I sent the patch. I don't think we have a good way to reverse it, so the best we can do is to guess it. But I don't think we should do that. Instead, should we just zero the struct and add a TODO? Alternative could be to use a double pointer for stat and do more checks in call sites, but I'm not familiar with the codebase to how that branching would need to look like. Any preference? -- Josip Sokcevic