Re: [PATCH v4 0/7] add/rm: honor sparse checkout and warn on sparse paths

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 14, 2021 at 1:36 PM Elijah Newren <newren@xxxxxxxxx> wrote:
>
> On Thu, Apr 8, 2021 at 1:41 PM Matheus Tavares
> <matheus.bernardino@xxxxxx> wrote:
> >
> > Make `rm` honor sparse checkouts, and make both `rm` and `add` warn
> > when asked to update sparse entries.
> >
> > There are two changes since v3:
> >
> > - `test_i18ncmp` and `test_i18ngrep` were replaced by `test_cmp` and
> >   `grep`
> >
> > - The flag added in patch 5 now makes refresh_index() completely ignore
> >   skip_worktree entries, instead of just suppressing their matches on
> >   the seen[] array. The previous implementation was not necessarily
> >   wrong but, as Junio pointed out, it was rather odd to keep matching
> >   the entries if we no longer want to use the matches.
> >
> >   As "side effects", the new version of the flag also makes
> >   refresh_index() refrain from both:
> >
> >   (1) checking and warning if skip_worktree entries matching the given
> >   pathspec are unmerged.
> >
> >   (2) marking skip_worktree entries matching the given pathspec with
> >   CE_UPTODATE.
> >
> >   The change (1) is actually interesting because `git add` doesn't
> >   update skip_worktree entries, and thus, it doesn't make much sense to
> >   warn if they are unmerged. Besides, we will already warn if the user
> >   requests to update such entries, anyway. And finally, unmerged
> >   entries should not have the skip_worktree bit set in the first place.
> >   (`git merge` should clean this bit when writing the new index, and
> >   neither `git sparse-checkout` nor `git update-index` allow to set the
> >   bit on an unmerged entry.)
> >
> >   Change (2) is perhaps not very beneficial, but it is also not harmful.
> >   The only practical difference we get by not setting the CE_UPTODATE
> >   flag in the skip_worktree entries is that, when writing a new index at
> >   the end of `git add --refresh`, do_write_index() will start checking
> >   if these entries are racy clean. Note that it already does that for
> >   all the skip_worktree entries that do not match the user-given
> >   pathspecs. And, in fact, this behavior distinction based on the
> >   pathspec only happens with `--refresh`. Plain `git add` and other
> >   options don't mark any skip_worktree entry with CE_UPTODATE
> >   (regardless of the pathspecs) and thus, all these entries are checked
> >   when writing the index. So `git add --refresh` will only do what the
> >   other options already do.
>
> Sorry for the delay.  These two changes sound good to me, and the
> range-diff looks reasonable.
>
> >   (Additionally, as I mentioned in [1], there might actually be at least
> >   one advantage in checking if the skip_worktree entries are racy clean.
> >   But this is a very specific case, and it's probably a topic for a
> >   another thread :)
> >
> > [1]: https://lore.kernel.org/git/CAHd-oW4kRLjV9Sq3CFt-V1Ot9pYFzJggU1zPp3Hcuw=qWfq7Mg@xxxxxxxxxxxxxx/
>
> This I'm a bit surprised by.  I thought the outcome there was that you
> didn't want to mark skip_worktree entries as CE_UPTODATE in order to
> force them to be stat'd in the future when someone clears the
> skip_worktree bit.

Hmm, not exactly. This situation is a bit tricky and I probably got
lost when trying to communicate my thoughts about it.

In short, the outcome of not marking skip_worktree entries as
CE_UPTODATE (which is an in-memory-only flag) is that, when writing
the updated index at the end of `git status --refresh` , we now
properly detect and mark skip_worktree entries whose associated files
are present in the working tree and are modified in relation to the
respective blobs. (This whole process is skipped for CE_UPTODATE
entries.)

This doesn't have any effect while the skip_worktree bit is set. But
it makes it possible for a later `git status` to properly show the
files as modified when the skip_worktree bit gets unset. If we don't
do this, the later `git status` will wrongly think these entries are
clean.

This is because of the way git detects racily clean entries.
Paraphrasing `Documentation/technical/racy-git.txt`, we take two
actions to diagnose these entries:

1) When we want to know if an entry is up-to-date: if the entry's
timestamp is equal to, or newer than, the index timestamp, we not only
compare the cached stat info with the filesystem stat info but we also
compare the actual contents.

2) When writing a new index: if the index contains racily clean
entries, their `st_size` is truncated to zero.

Item 2) is important because, otherwise, the subsequent operations
wouldn't be able to detect the racily clean entries using 1) as the
index timestamp would have been updated.

And that's what happens with skip_worktree entries on `git status
--refresh`. We mark them as CE_UPTODATE even if the file exists in the
working tree, so we don't check if the cached entry is racily clean,
and thus we don't truncate `st_size` to 0, hiding the racily clean
entry.

With all that said, I think this whole situation must be quite rare
and not very important in practice...



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux