Re: [PATCH 7/8] config: add core.untrackedCache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Dec 14, 2015 at 8:44 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:

I'm replying to & quoting from two E-Mails of yours at once here for
clarity & less noise. I'm working wich Christian on getting this
integrated, and we both thought it would be good to have some fresh
input on the matter from me.

> Christian Couder <christian.couder@xxxxxxxxx> writes:

>> If you want only some repos to use the UC, you will set
>> core.untrackedCache in the repo config. Then after cloning such a
>> repo, you will copy the config file, and this will not be enough to
>> enable the UC.
>
> Surely.  "Does this index file keeps track of the untracked files'
> states?" is a property of the index.  Cloning does not propagate the
> configuration and copying or not copying is irrelevant.  If you want
> to enable, running "update-index --untracked-cache" is a way to do
> so.  I cannot see what's so hard about it.
>
>> And if you have set core.untrackedCache in the global config when you
>> clone, UC is enabled, but if you have just set it in the repo config
>> after the clone, it is not enabled.
>
> That's fine.  In your patch series, if you set it in the global, you
> will get the cache in the new one.  With the cleaned-up semantics I
> suggested, the same thing will happen.
>
> And with the cleaned-up semantics, the configuration is *ONLY* used
> to give the *DEFAULT* before other things happen, i.e. creation of
> the index file for the first time.  Because the configuration is
> only the default, an explicit "update-index --[no-]untracked-cache"
> will defeat it, just like any other config/option interaction.

As you know Christian is working on this for Booking.com to integrate
features we find useful into git.git in such a way that we don't have
to maintain some internal fork of Git.

What we're trying to do, and what a lot of other big deployments of
Git elsewhere would also find useful, is to ship a default sensible
configuration for all users on the system in /etc/gitconfig.

I'd like to be able to easily enable some feature that aids Git
performance globally on our thousands of machines and for our hundreds
of users by just tweaking something in puppet to change
/etc/gitconfig, and more importantly if that change ends up being bad
reverting that config in /etc/gitconfig should undo the change.

It's an unacceptable level of complexity for system-level automation
to have to scour the filesystem for existing Git repositories and run
"git update-index" on each of them, that's why we're submitting
patches to make this a config option, so we can simply flip a flag in
/etc/gitconfig.

It's also unacceptable to have the config simply provide the default
which'll be frozen either at clone time or after an initial "git
status".

Let's say I ship a /etc/gitconfig that says "new clones should use the
untracked cache". Now I roll that out across our fleet of machines and
it turns out the morning after that the feature doesn't work properly
for whatever reason. If it's just a "default until clone or status"
type of thing even if I revert the configuration a lot of users &
their repositories in the wild will still be broken, and will have to
be manually fixed. Which again leads to the scouring the filesystem
problem.

So that gives some more context for why we're pushing for this change.
I believe this feature breaks no existing use-case and just supports
new ones, and I think that your objections to it are based on a simple
misunderstanding as will become apparent if you read on below.

> The biggest issue I had with your patch series, IIRC, is that
> configuration will defeat the command line option.

I think it's a moot point to focus on configuration v.s. command-line
option. The important question is whether or not this feature can
still be configured on a repo-local basis with this series as before.
That's still the case since --local git configuration overrides
--global and --system, so users who want to enable/disable this
per-repo still can.

>> Shouldn't it be nice if they could just enable core.untrackedCache in
>> the global config files without having to also cd into every repo and
>> use "git update-index --untracked-cache" there?
>
> NO.  It is bad to change the behaviour behind users' back.

I'm not quite sure what the objection here is exactly. If you're a
normal user you can enable/disable this per-repo just like you can
now, and enable/disable it for all your repos in ~/.gitconfig.

If you mean that the user's configuration shouldn't be changed by the
global config in /etc/gitconfig I do think that's a moot point. If
you're a user on a system where I have root and I want to change your
Git configuration I'm going to be able to do that whatever the
mechanism is.

That's indeed that's what we're doing to enable this at Booking.com
currently, we run a job to find some limited set of common checkouts
and run "git update-index" for users as root. The problem with that is
that it's needlessly complex, hence this series.

But in case you mean disabling the config for existing checkouts if
there's no configuration, I address that at the end of this mail.

[...]
> The primary reason why I do not like your "configuration decides" is
> it will be a huge source of confusions and bugs.  Imagine what
> should happen in this sequence, and when should a stale cached
> information be discarded?
>
>  - the configuration is set to 'yes'.
>  - the index is updated and written by various commands.
>  - more work is done in the working tree without updating the index.
>  - the configuration is set to 'no'.
>  - more work is done in the working tree without updating the index.
>  - the configuration is set to 'yes'.
>  - more work is done in the working tree without updating the index.
>  - somebody asks "what untracked paths are there?"
>
> In the "configuration decides" world, I am not sure how a sane
> implementation efficiently invalidates the cache as needed, without
> the config subsystem having intimate knowledge with the untracked
> cache (so far, the config subsystem is merely a key-value store and
> does not care _what_ it stores; you would want to invalidate the
> cache in the index when somebody sets the variable to 'no', which
> means the config subsystem needs to know that this variable is
> special).

I think this is the main misunderstanding about how this works that
needs to be clarified.

It would indeed really suck if changing this to some configuration
option introduced some race condition where fiddling with the config
option would render the cache stale & invalid. I fully agree that that
should prevent the inclusion of this patch series.

The way the "config decides" patch series deals with this is that if
you have the UC information in the index and the configuration is set
to core.untrackedCache=false the UC will be removed from the index.

Otherwise you would indeed easily end up with a stale cache, since you
could disable it, then make some tweaks to the index or your files,
and then subsequently enable it and end up with nonsensical "git
status" output.

There's a test for this in t/t7063-status-untracked-cache.sh called
"unsetting core.untrackedCache and using git status removes the
cache".

Summing this up: The only thing that this configuration potentially
*does* break, and doesn't address, and which could be fixed. Is the
following scenario. Once this series is applied and git is shipped
with it existing users that have set "git update-index
--untracked-cache" will have their UC feature disabled. This is
because we fall back to "oh no config here? Must have been disabled,
rm it from the index" clause which keeps our UC from going stale in
the face of config flip-flopping.

We *could* make even that use-case work by detecting the legacy marker
for the UC in the index (the uname info), then we'd do a one-time "git
config --local core.untrackedCache true" and remove the marker. Thus
users who upgrade git and had the untracked cache enabled already
would have their checkouts migrated to whatever their existing setup
was.

I don't think that's worth it for two reasons 1) This is a really new
experimental feature and I think it's fine to just change how it works
2) It's just as likely that this surprises users and doesn't do what
they want, i.e. someone will think "neat, I can toggle this in
~/.gitconfig now", then they set it to "false" because they don't want
it, but some of their existing checkouts that had it enabled will be
migrated to "core.untrackedCache=true".

So I think it makes the most sense to just apply this as-is.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]