Re: [PATCH v3 5/7] refresh_index(): add REFRESH_DONT_MARK_SPARSE_MATCHES flag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Tue, Mar 30, 2021 at 11:51 AM Matheus Tavares Bernardino
<matheus.bernardino@xxxxxx> wrote:
>
> On Fri, Mar 19, 2021 at 1:05 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
> >
> > Matheus Tavares Bernardino <matheus.bernardino@xxxxxx> writes:
> >
> > >> In other words, the change makes me wonder why we are not adding a
> > >> flag that says "do we or do we not want to match paths outside the
> > >> sparse checkout cone?", with which the seen[] would automatically
> > >> record the right thing.
> > >
> > > Yeah, makes sense. I didn't want to make the flag skip the sparse
> > > paths unconditionally (i.e. without matching them) because then we
> > > would also skip the ce_stage() checkings below and the later
> > > ce_mark_uptodate(). And I wasn't sure whether this could cause any
> > > unwanted side effects.
> > >
> > > But thinking more carefully about this now, unmerged paths should
> > > never have the SKIP_WORKTREE bit set anyway, right? What about the
> > > CE_UPTODATE mark, would it be safe to skip it? I'm not very familiar
> > > with this code, but I'll try to investigate more later.
>
> Sorry I haven't given any update on this yet. From what I could see so
> far, it seems OK to ignore the skip_worktree entries in
> refresh_index() when it is called from `git add --refresh`. But
> because we would no longer mark the skip_worktree entries that match
> the pathspec with CE_UPTODATE, do_write_index() would start checking
> if they are racy clean (this is only done when `!ce_uptodate(ce)`),
> which could add some lstat() overhead.
>
> However, this made me think what happens today if we do have a racy
> clean entry with the skip_worktree bit set... `git add --refresh` will
> probably update the index without noticing that the entry is racy
> clean (because it won't check CE_UPTODATE entries, and skip_worktree
> entries will have this bit set in refresh_index()). Thus the entries'
> size won't be truncated to zero when writing the index, and the entry
> will appear unchanged even if we later unset the skip_worktree bit.
>
> But can we have a "racy clean skip_worktree entry"? Yes, this seems
> possible e.g. if the following sequence happens fast enough for mtime
> to be the same before and after the update:
>
>   echo x >file
>   git update-index --refresh --skip-worktree file
>   echo y>file
>
> Here is a more complete example which artificially creates a "racy
> clean skip_worktree entry", runs `git add --refresh`, and shows that
> the racy clean entry was not detected:
>
> # Setup
> echo sparse >sparse
> echo dense >dense
> git add .
> git commit -m files
>
> # Emulate a racy clean situation
> touch -d yesterday date
> touch -r date sparse
> git update-index --refresh --skip-worktree sparse
> touch -r date .git/index
> echo xparse >sparse
> touch -r date sparse
>
> # `git add` will now write a new index without checking if
> # `sparse` is racy clean nor truncating its size
> touch -r date dense
> git add --refresh .
>
> git update-index --no-skip-worktree sparse
> git status
> <doesn't show that `sparse` was modified>
>
> This situation feels rather uncommon, but perhaps the same could
> happen with `git sparse-checkout set` instead of `git update-index
> --refresh --skip-worktree`? IDK. This made me think whether
> refresh_index() should really mark skip_worktree entries with
> CE_UPTODATE, in the first place.
>
> Any thoughts?

Wow, that's a weird one.  Nice digging.  I don't understand the racily
clean stuff that well, or the CE_UPTODATE handling...but based on my
meager understanding of these things I'd say that marking
skip_worktree entries with CE_UPTODATE seems wrong and I'd agree with
your hunch that we shouldn't be updating it for those files.  If an
entry is skip_worktree, it's assumed to not be present in the working
tree and that we'll treat it as "unchanged".  So, when the filehappens
to be present despite that bit being set, checking the files' stat
information or pretending to have done so so we can claim it is up to
date seems wrong to me.  In fact, I'd say the mere recognition that
the file is present in the working tree despite being SKIP_WORKTREE to
me implies it's not "up-to-date" for at least one definition of that
term.

I suspect that if someone flips the skip-worktree bit on and off for a
file without removing it from the working tree, not marking
skip_worktree entries with CE_UPTODATE as you mention above might
force us to incur the cost of an extra stat on said file when we run
"git status" later.  If so, I think that's a totally reasonable cost,
so if that was your worry, I'd say that this is shifting the cost
where it belongs.

But, like I said, I'm not very familiar with the racily clean code or
CE_UPTODATE, so it's possible I said something above that doesn't even
make sense.  Does my reasoning help at all?



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux