Re: [RFD] Rewriting safety - warn before/when rewriting published history

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 6 Feb 2012, Johan Herland wrote:
> On Mon, Feb 6, 2012 at 18:14, Jakub Narebski <jnareb@xxxxxxxxx> wrote:
>> On Mon, 6 Feb 2012, Johan Herland wrote:
>>> On Mon, Feb 6, 2012 at 15:44, Jakub Narebski <jnareb@xxxxxxxxx> wrote:

[...]
>> Yes, starting with prototype implementation using existing infrastructure
>> (hooks) would be a very good idea.  (That's how first versions of what
>> became submodules were implemented.)
>>
>> OTOH we should be aware of limitations of said prototype due to the fact
>> that it is a prototype...
> 
> Agreed, but AFAICS (and modulo the addition of pre-rewrite and
> pre/post-push hooks mentioned earlier) all of the things discussed so
> far in this thread can be implemented as hooks.

That would be nice.

And the new hooks (pre-rewrite, pre/post-push) would be useful anyway,
I think.

[...]
>>> And trying to plug too many holes might end
>>> up annoying more experienced users who "know what they're doing".
>>
>> Second, forcing via command line parameter should always be an option.
> 
> Obviously, but if a large portion of the Git community felt the need
> to always disable this feature, I'd say that we'd failed. The best
> features are those that Just Work(tm).

Well, we have some features in git like advice.* that cater to newbies.
This could be such newbie-mostly feature too.

Anyway what I was thinking about is that forcing is for those hopefully
rare cases where Git safety net is too strict, where DWIM-mery fails.

>>> Instead we might want to add a client-side check at push time. I
>>> realize that this check is already done by the remote end, but the
>>> client-side might be able to give a more helpful response along the
>>> lines of:
>>
>> [...]
>>
>> Explanation is good, but the whole idea of rewriting safety is that you
>> are informed (warned or denied) _before_ attempting rewrite and doing much
>> work.
> 
> True, but there may be cases where the rewrite is not apparent until
> after it has happened. E.g. a novice user may use 'git reset --hard'
> in order to get to an earlier state for testing purposes, and then -
> after completing the test - 'git reset --hard' back to the starting
> point. I know this is not best practice, but is it bad enough that we
> want to refuse it?

I'd like to have safety net also for using 'git reset --hard' to rewind
publishable branch past published history.  Though preventing e.g. force
rename of other branch to publishable branch so it doesn't fast-forward
WRT its upstream would be going to far, I guess...
 
>>> First, we don't need to annotate _all_ commits. For the 'public'
>>> state, we only annotate the last/tip commit that was pushed/published.
>>> From there, we can defer that all ancestor commits are also 'public'.
>>
>> Right.
>>
>>> For the 'secret' state, we do indeed annotate _all_ secret commits,
>>> but I believe this will be a somewhat limited number of commits. If
>>> your workflow forces you to annotate millions of commits as 'secret',
>>> I claim there is something wrong with your workflow.
>>
>> Well, for the 'secret' we can rely on the fact that child of 'secret'
>> commit must also be 'secret' (non-publishable) if secret is to stay
>> secret.  Still marking all 'secret' commits might be better idea from
>> UI and from performance point of view.
> 
> I don't think we should automatically assume that all children of a
> 'secret' commit are also 'secret'.

All right.  What is important is that safety net related to 'secret'
commits would prevent publishing such commits even if you build non-secret
commits on top of 'secret' one.

>                                 First of all, the git DAG was not 
> made for iterating forwards in history, so given a 'secret' commit, it
> is computationally expensive to enumerate all its implied-'secret'
> descendants.

Right, this would enormously complicate detecting 'secret'-ness of commit
when browsing history / querying "phase" of a commit..

> More importantly though, I don't agree with the premise. 
> I would typically use the 'secret' state as follows: While debugging a
> piece of code, I might commit a few debug print statements, and I
> would typically mark this debug commit as 'secret', in order to
> prevent myself from accidentally pushing this. Although it probably
> doesn't matter in practice, I think it is wrong for the commits made
> (temporarily) on top of this debug commit to be also considered
> 'secret'. They are only unpublishable as a consequence of being based
> on the debug commit, and only until I get around to rebasing away the
> 'secret' debug commit.

Right, thanks for sanity check.

So another difference that in its full complexity the 'secret' is an
attribute of an individual commit which is (usually) local to repository,
while 'public' is in its full complexity an attribute of revision walking
rather than something that can be attached to a commit.

Also, when thinking about different scenarios of why one would like to
mark commit as 'secret', we might want to be able to mark commit as
secret / unpublishable with respect to _subset_ of remotes, so e.g.
I am prevented from accidentally publishing commits marked as 'secret'
to public repository, or to CI/QA repository, but I can push (perhaps
with warning) to group repository, together with 'secret'-ness state
of said commit... 

... though it wouldn't be as much 'secret' as 'confidential' ;-)

>>>> Second, I have doubts if "phase" is really state of an individual commit,
>>>> and not the feature of revision walking.
>>
>> It matters to presentation: can commit be simultaneously 'public' because
>> of one branch, and 'draft' because of other.
>>
>>> I believe the 'public' state is a "feature of revision walking" (i.e.
>>> one annotated 'public' commit implies that all its ancestors are also
>>> 'public'). However, the 'secret' state should be bound to the
>>> individual commit, IMHO.
>>
>> Good call, otherwise 'secret' commit could have been "side-leaked"
>> by other refs being pushed.
>>
>> This means though that 'public' / 'draft' while looking similar to 'secret'
>> are in fact a bit different things.  In other words 'immutable' and
>> 'impushable' traits are quite a bit different in behavior...
>>
>> Especially that one acts at pre-rewrite time, and second pre-push time.
> 
> Exactly. I find Mercurial 'phase' language confusing, precisely for
> the reason that 'public' and 'secret' are DIFFERENT concepts. One
> hinders rewrite and naturally applies to a commit AND its ancestors,
> while the other hinders push and only applies to the commit itself.
> The fact that they could be implemented by the same mechanisms (hooks
> and notes) does not make them the same thing.

Well, if one doesn't think about all possible usages and complications,
the simplicity of metaphor of "phases" is quite compelling.  We have
(at first glance)

   public < draft < secret

states of a commit (changeset), just like we have

   solid < liquid < gaseous

phases of matter.  

If we treat traits of 'immutability' and 'publishability' as true boolean,
such ordering of "phases" looks natural.  Just like matter goes from gas
to liquid to solid with lowering temperature, changeset "phase" goes from
'secret' to 'draft' to 'public' following ancestry chain.

But if we go for expressivity and power like the rest of Git, rather than
for simplicity like Mercurial does, the differences between 'secret'-ness
and 'public'-ness make it unlikely that it would be a good idea to encompass
those in a single UI and a single concept.  I guess that it can be another
issue where Mercurial oversimplifies its approach (c.f. branches which
started from clone to branch and anonymous branches, to "named branches"
which are really commit labels, to local-only bookmark branches, to 
transferrable Git-like bookmark branches but with single namespace, and
only just considering equivalent of Git's refspec and refs mapping; c.f.
revision ranges based on local numbering of commits which made ranges
local too, to new Git-like topological `--ancestry-path`-like ranges). 


Also with per-remote 'secret'-ness, we might have the scenario where
a commit is marked as 'secret' for public repo, we have pushed it to
group repo (and it's 'secret'-ness trait together with it), an now we
want to be warned if we try to rewrite said commit.

So, separate traits it is (of revision walk, or of a commit), rather
than "smush-together" antity / attribute of "phase" of revision.


That is BTW why I wanted us to have this discussion... :-D

[...]
>>> Basically, there are three different "levels" for this rewrite/publish
>>> protection to run at:
>>>
>>> 1. Do not meddle at all. This is the current behavior, and assumes
>>> that if the user rewrites and pushes something, the user knows what
>>> he/she is doing, and Git should not meddle (obviously unless the
>>> server refuses the push).
>>
>> I think that there should be some easy way to force such behavior,
>> i.e. to discard rewrite safeties.
> 
> Indeed. We should probably have a simple config flag to enable the
> rewrite protection. In fact I would argue that the flag should default
> to false (disable protection) when unset, and then we should let
> init/clone set the config flag to true (enable protection) in newly
> created repos (unless explicitly disabled in the user/system configs).
> This way, behavior does not change for existing repos, but new repos
> are protected by default (with only a single command needed to disable
> the protection).

And commit option for discarding safeties for specific push, or for
specific rewrite (e.g. pushed at night, noticed an error in commit
message, force-amended, and force-pushed again, hoping that the window
of opportunity is sufficintly small so that nobody will be affected).

>>> 2. Warn/refuse rewriting commits in your upstream. This would only
>>> check branch X against its registered upstream. Only if there is a
>>> registered upstream, and you're about to rewrite commits that are
>>> reachable from the upstream remote-tracking branch, should Git
>>> intervene and warn/refuse the rewrite. This level would IMHO provide
>>> most of the benefit, and little or no trouble (i.e. false positives).
>>
>> Right.  I wonder if we can get usage statistics from Mercurial users
>> about usage of their "phases" feature... though mapping terminology
>> for example 'upstream' from Git to Mercurial and vice versa can be
>> a pain, I guess.
> 
> I'm unsure how useful it would be. IMHO Git and Mercurial are
> different enough (and promote sightly different workflows) that I
> don't trust the the average Mercurial user's preference for
> Mercurial's 'phase' behavior to be transferable to the average Git
> user's preference for a similar behavior in Git.

Well, bad statistics might be better than no statistics.  What would
be especially interesting is _complaints_ about "phases" feature,
don't you think, i.e. where the "phases" metaphor fails.

[...]

-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]