Re: Untracked Cache and --untracked-files=all

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Tue, Jun 22, 2021 at 2:25 AM Tao Klerks <tao@xxxxxxxxxx> wrote:
>
> Hi folks,
>
> I'm hoping for a little help understanding *intent* around a
> particular code comment in dir.c, and reaching out to the whole list

Oh, boy, dir.c.  We should probably put a simple warning at the top of
the file, something like "Abandon all hope, all ye who enter here".

> because someone (Junio?) said they consider any mail that *doesn't*
> copy the list to be spam anyway, and the original author of the
> comment in question (Duy Nguyen) is no longer active on git.
>
> Context:
> I am trying to explore how to get "--untracked-files=all" to play nice
> with the untracked cache, so that windows users using tooling that
> sets "--untracked-files=all" can benefit from the same
> orders-of-magnitude git status performance improvements as commandline
> users.
>
> There is a "naive" approach to this (store the untracked cache in the
> index file with whatever dir flags were specified/used in the
> recursive walk, and ignore/rewrite that cached data every time the
> flags change), and there is presumably a more "comprehensive" approach
> (store all the information required in the untracked cache to be able
> to satisfy requests with either set of flags - even if this is a
> little more expensive on first run).
>
> The main disadvantage of the "naive" approach is that is every time
> you flip-flop between "git status" and "git status -u", the untracked
> cache is ignored, a recursive directory walk ensues, and the untracked
> cache is rewritten to the index file for the next time you rerun,
> hopefully with the same flags. However, I would think in most
> situations flip-flopping will be less common - more commonly you're
> using a tool or workflow that ends up running the same command(s)
> repeatedly... At least, that's my thesis. I would put this "store the
> untracked cache every time dir flags change" behavior behind a config
> switch, anyway.
>
> This "naive" approach *is* rather easy to achieve - you just need to
> recreate a new "untracked" structure inside
> dir.c#validate_untracked_cache() if you find a mismatch of flags (and
> make other small fixes to store the correct flags in that new
> "untracked" structure).
>
> The one thing that sticks out after making these changes is a code
> comment first introduced in 2015 by Duy Nguyen in ccad261f, explaining
> *why* we refuse to use the untracked cache with "-uall":
> > * See treat_directory(), case index_nonexistent. Without
> > * this flag, we may need to also cache .git file content
> > * for the resolve_gitlink_ref() call, which we don't.
>
> I've seen this comment many times over the past few months, and I've
> previously always interpreted it as a "correctness" concern.
>
> Looking at the original code in question, however (as of ccad261f,
> dir.c#treat_directory(), "resolve_gitlink_ref" call), I don't
> understand how correctness could be impacted.
>
> Fast-forward 6 years and all this code has been substantially
> overhauled by several folks over the years (most recently and majorly
> Elijah Newren), and the "resolve_gitlink_ref()" call is long-gone.

I didn't remember removing any resolve_gitlink_ref() calls, but a
quick search (`git log -Sresolve_gitlink_ref -p -- dir.c`) suggests it
was removed by b22827045e0b (dir: do not traverse repositories with no
commits, 2019-04-09).  A brief look at that commit message certainly
suggests things have changed in a way that _might_ make the original
comment irrelevant.

> Does this look familiar to anyone? Is there any remaining obvious
> reason to be wary of storing the untracked cache structure produced
> with '-uall'?

Untracked cache is its own beast.  I fought with it a bunch, but it's
been a while.  (See e.g.
https://lore.kernel.org/git/CABPp-BEWhavmtDi-ahT+QMWtD6Fe-Ey7cn_82nbetWEQJL=hRA@xxxxxxxxxxxxxx/)

I can say that untracked traversal had all sorts of weirdness and
breakage, which could have made untracked cache harder to make right
and consistent particularly with -uall.  See for example 8d92fb292706
(dir: replace exponential algorithm with a linear one, 2020-04-01) and
perhaps aa6e1b21e5de (dir: avoid unnecessary traversal into ignored
directory, 2021-05-12).  In particular, the exponential traversal
(paths at depth N might be traversed 2^N times) from the former commit
combined with the fact that later traversals often returned a
different status for a path than the first traversal certainly could
have caused some additional weirdness for the untracked cache.

It may be that some combination of Kyle's removal of the
resolve_gitlink_ref() and my removal of the exponential traversal of
untracked paths make your idea safe...but this is dir.c and
untracked-cache, so who really knows?


Not sure if that helps, but I hope one or two pointers I provided are
somehow meaningful to you.

Elijah



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux