Re: [PATCH 7/8] config: add core.untrackedCache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 15, 2015 at 2:04 PM, Ævar Arnfjörð Bjarmason
<avarab@xxxxxxxxx> wrote:
> On Mon, Dec 14, 2015 at 8:44 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> I'm replying to & quoting from two E-Mails of yours at once here for
> clarity & less noise. I'm working wich Christian on getting this
> integrated, and we both thought it would be good to have some fresh
> input on the matter from me.
>
>> Christian Couder <christian.couder@xxxxxxxxx> writes:
>
>>> If you want only some repos to use the UC, you will set
>>> core.untrackedCache in the repo config. Then after cloning such a
>>> repo, you will copy the config file, and this will not be enough to
>>> enable the UC.
>>
>> Surely.  "Does this index file keeps track of the untracked files'
>> states?" is a property of the index.  Cloning does not propagate the
>> configuration and copying or not copying is irrelevant.  If you want
>> to enable, running "update-index --untracked-cache" is a way to do
>> so.  I cannot see what's so hard about it.
>>
>>> And if you have set core.untrackedCache in the global config when you
>>> clone, UC is enabled, but if you have just set it in the repo config
>>> after the clone, it is not enabled.
>>
>> That's fine.  In your patch series, if you set it in the global, you
>> will get the cache in the new one.  With the cleaned-up semantics I
>> suggested, the same thing will happen.
>>
>> And with the cleaned-up semantics, the configuration is *ONLY* used
>> to give the *DEFAULT* before other things happen, i.e. creation of
>> the index file for the first time.  Because the configuration is
>> only the default, an explicit "update-index --[no-]untracked-cache"
>> will defeat it, just like any other config/option interaction.
>
> As you know Christian is working on this for Booking.com to integrate
> features we find useful into git.git in such a way that we don't have
> to maintain some internal fork of Git.
>
> What we're trying to do, and what a lot of other big deployments of
> Git elsewhere would also find useful, is to ship a default sensible
> configuration for all users on the system in /etc/gitconfig.
>
> I'd like to be able to easily enable some feature that aids Git
> performance globally on our thousands of machines and for our hundreds
> of users by just tweaking something in puppet to change
> /etc/gitconfig, and more importantly if that change ends up being bad
> reverting that config in /etc/gitconfig should undo the change.
>
> It's an unacceptable level of complexity for system-level automation
> to have to scour the filesystem for existing Git repositories and run
> "git update-index" on each of them, that's why we're submitting
> patches to make this a config option, so we can simply flip a flag in
> /etc/gitconfig.
>
> It's also unacceptable to have the config simply provide the default
> which'll be frozen either at clone time or after an initial "git
> status".
>
> Let's say I ship a /etc/gitconfig that says "new clones should use the
> untracked cache". Now I roll that out across our fleet of machines and
> it turns out the morning after that the feature doesn't work properly
> for whatever reason. If it's just a "default until clone or status"
> type of thing even if I revert the configuration a lot of users &
> their repositories in the wild will still be broken, and will have to
> be manually fixed. Which again leads to the scouring the filesystem
> problem.
>
> So that gives some more context for why we're pushing for this change.
> I believe this feature breaks no existing use-case and just supports
> new ones, and I think that your objections to it are based on a simple
> misunderstanding as will become apparent if you read on below.
>
>> The biggest issue I had with your patch series, IIRC, is that
>> configuration will defeat the command line option.
>
> I think it's a moot point to focus on configuration v.s. command-line
> option. The important question is whether or not this feature can
> still be configured on a repo-local basis with this series as before.
> That's still the case since --local git configuration overrides
> --global and --system, so users who want to enable/disable this
> per-repo still can.
>
>>> Shouldn't it be nice if they could just enable core.untrackedCache in
>>> the global config files without having to also cd into every repo and
>>> use "git update-index --untracked-cache" there?
>>
>> NO.  It is bad to change the behaviour behind users' back.
>
> I'm not quite sure what the objection here is exactly. If you're a
> normal user you can enable/disable this per-repo just like you can
> now, and enable/disable it for all your repos in ~/.gitconfig.
>
> If you mean that the user's configuration shouldn't be changed by the
> global config in /etc/gitconfig I do think that's a moot point. If
> you're a user on a system where I have root and I want to change your
> Git configuration I'm going to be able to do that whatever the
> mechanism is.
>
> That's indeed that's what we're doing to enable this at Booking.com
> currently, we run a job to find some limited set of common checkouts
> and run "git update-index" for users as root. The problem with that is
> that it's needlessly complex, hence this series.
>
> But in case you mean disabling the config for existing checkouts if
> there's no configuration, I address that at the end of this mail.
>
> [...]
>> The primary reason why I do not like your "configuration decides" is
>> it will be a huge source of confusions and bugs.  Imagine what
>> should happen in this sequence, and when should a stale cached
>> information be discarded?
>>
>>  - the configuration is set to 'yes'.
>>  - the index is updated and written by various commands.
>>  - more work is done in the working tree without updating the index.
>>  - the configuration is set to 'no'.
>>  - more work is done in the working tree without updating the index.
>>  - the configuration is set to 'yes'.
>>  - more work is done in the working tree without updating the index.
>>  - somebody asks "what untracked paths are there?"
>>
>> In the "configuration decides" world, I am not sure how a sane
>> implementation efficiently invalidates the cache as needed, without
>> the config subsystem having intimate knowledge with the untracked
>> cache (so far, the config subsystem is merely a key-value store and
>> does not care _what_ it stores; you would want to invalidate the
>> cache in the index when somebody sets the variable to 'no', which
>> means the config subsystem needs to know that this variable is
>> special).
>
> I think this is the main misunderstanding about how this works that
> needs to be clarified.
>
> It would indeed really suck if changing this to some configuration
> option introduced some race condition where fiddling with the config
> option would render the cache stale & invalid. I fully agree that that
> should prevent the inclusion of this patch series.
>
> The way the "config decides" patch series deals with this is that if
> you have the UC information in the index and the configuration is set
> to core.untrackedCache=false the UC will be removed from the index.
>
> Otherwise you would indeed easily end up with a stale cache, since you
> could disable it, then make some tweaks to the index or your files,
> and then subsequently enable it and end up with nonsensical "git
> status" output.
>
> There's a test for this in t/t7063-status-untracked-cache.sh called
> "unsetting core.untrackedCache and using git status removes the
> cache".
>
> Summing this up: The only thing that this configuration potentially
> *does* break, and doesn't address, and which could be fixed. Is the
> following scenario. Once this series is applied and git is shipped
> with it existing users that have set "git update-index
> --untracked-cache" will have their UC feature disabled. This is
> because we fall back to "oh no config here? Must have been disabled,
> rm it from the index" clause which keeps our UC from going stale in
> the face of config flip-flopping.
>
> We *could* make even that use-case work by detecting the legacy marker
> for the UC in the index (the uname info), then we'd do a one-time "git
> config --local core.untrackedCache true" and remove the marker. Thus
> users who upgrade git and had the untracked cache enabled already
> would have their checkouts migrated to whatever their existing setup
> was.

The above needs a patch, that I haven't sent yet, but will send really
soon now, and that removes everything from the "ident" field in the
UC, because this field is useless with the patch series.

> I don't think that's worth it for two reasons 1) This is a really new
> experimental feature and I think it's fine to just change how it works
> 2) It's just as likely that this surprises users and doesn't do what
> they want, i.e. someone will think "neat, I can toggle this in
> ~/.gitconfig now", then they set it to "false" because they don't want
> it, but some of their existing checkouts that had it enabled will be
> migrated to "core.untrackedCache=true".
>
> So I think it makes the most sense to just apply this as-is.

Or rather to apply something based on the patch series I will send soon.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]