Re: "Purposes, Concepts,Misfits, and a Redesign of Git" (a research paper)

Jakub Narębski <jnareb@xxxxxxxxx> · Fri, 7 Oct 2016 20:05:39 +0200

On 7 October 2016 at 18:55, Santiago Perez De Rosso
<sperezde@xxxxxxxxxxxxx> wrote:
> On Wed, Oct 5, 2016 at 6:15 AM Jakub Narębski <jnareb@xxxxxxxxx> wrote:
>> On 5 October 2016 at 04:55, Santiago Perez De Rosso
>> <sperezde@xxxxxxxxxxxxx> wrote:
>>> On Fri, Sep 30, 2016 at 6:25 PM Jakub Narębski <jnareb@xxxxxxxxx> wrote:
>>>> W dniu 30.09.2016 o 18:14, Konstantin Khomoutov pisze:
>>>>
>>>>> The "It Will Never Work in Theory" blog has just posted a summary of a
>>>>> study which tried to identify shortcomings in the design of Git.
>>>>>
>>>>> In the hope it might be interesting, I post this summary here.
>>>>> URL: http://neverworkintheory.org/2016/09/30/rethinking-git.html
>>>>
>>>> I will comment on the article itself, not just on the summary.
>>>>
>>>> | 2.2 Git
>>>> [...]
>>>> | But tracked files cannot be ignored; to ignore a tracked file
>>>> | one has to mark it as “assume unchanged.” This “assume
>>>> | unchanged” file will not be recognized by add; to make it
>>>> | tracked again this marking has to be removed.
[...]
>> Yes, this is true that users may want to be able to ignore changes to
>> tracked files (commit with dirty tree), but using `assume-unchanged` is
>> wrong and dangerous solution.  Unfortunately the advice to use it is
>> surprisingly pervasive.  I would thank you to not further this error.
>
> Ok, I added a footnote in the paper when we first mention assume unchanged
> that says:
>
> Assume unchanged was intended to be used as a performance optimization but
> has since been appropriated by users as a way to ignore tracked files. The
> current advice is to use the “skip worktree” marking instead
>
> This should prompt readers to look into skip worktree next time they want to
> ignore tracked files. I don't think people reading the paper are doing so to
> learn Git but at least it should contribute to not furthering the error.

Thank you very much.

The problem with "assume-unchanged" is that by using it to ignore
changes to tracked files you are lying to Git (telling it 'assume this
is unchanged' while changing it), and can lead to DATA LOSS, that
is to losing those changes.

[...]
>>>> [...]
>>>> | The problem
>>>> | is the lack of connection between this purpose and the highlevel
>>>> | purposes for version control, which suggests that the
>>>> | introduction of stashing might be to patch flaws in the design
>>>> | of Git and not to satisfy a requirement of version control.
>>>>
>>>> Or the problem might be that you are missing some (maybe minor)
>>>> requirement of version control system. Just saying...
>>>
>>> What would that purpose be? and why would you say that's a
>>> high-level purpose for version control and not one that's
>>> git-specific?
>>
>> The stash (or rather its equivalent) is not something Git specific.
>> It is present also in other version control systems, among others:
>>
>> * Mercurial: as 'shelve' extension (in core since 1.8)
>> * Bazaar: as 'bzr shelve' command
>> * Fossil: as 'fossil stash' command (with subcommands)
>> * Subversion: Shelve planned for 1.10 (2017?)
>
> Do these other VCSs have the same "Switching branches" misfit? Do you
> usually need to stash if you want to switch with uncommitted changes? I know
> Mercurial has the same problem since ``bookmarks'' are like Git branches, so
> it makes sense for them to have added something like stashing (if otherwise
> switching with uncommitted changes would be very difficult).

I suspect that all those are inspired by each other, and that
they all use 'uncommitted changes are not tied to a branch'
paradigm, which allows for creating a branch for changes
after a fact (when it turns out that it would take more than
one commit to implement the feature) quite easy.

>> I would say that 'stash' could be considered about isolating work on
>> different features, different sub-branch sized parallel work.

Note that 'isolating work' is missing from your list of purposes
of a version control system; though it is fairly obvious.

>
> That sounds a lot like having independent lines of development, which is
> what branches are supposed to be for

Those are sub-commit changes. Branches are composed of
commits. But I agree that this may be a bit of a stretch in
trying to find a high-level purpose for stash (rather than it being
a convenience feature). As I said below...

>> But it might be that stash doesn't have connection with highlevel
>> purposes for version control, and that it is purely convenience
>> feature.  Just playing the role of Advocatus Diaboli (important in
>> scientific works, isn' it?)...

[...]

>>>> | 7. Gitless
[...]
>>>> |  Also, there
>>>> | is no possible way of getting in a “detached head” state; at
>>>> | any time, the user is always working on some branch (the
>>>> | “current” branch). Head is a per-branch reference to the last
>>>> | commit of the branch.
[...]
>>>> [...] during some long lived multi-step operations, like bisect
>>>> or interactive rebase, you are not really on any branch,
>>>
>>> In Gitless we don't have bisect but for rebase (fuse in Gitless) we
>>> record the current branch.
>>
>> No bisect?  This is very useful feature.  Though it might be done
>> without detached HEAD, but with specialized pseudo-branch 'bisect' (as
>> it was done in earlier versions of Git, or maybe even now).
>>
>> Anyway, for [interactive] rebase / transplant / graft / fuse you need
>> to be able to abort an operation and return to the state before
>> staring rebase.  Though you can or do solve this by remembering
>> the starting position.
>
> Yes, Gitless remembers the starting position. We should be able to get
> bisect working too in the same way. Internally, the head is detached but
> that's irrelevant to the user. As far as the user is concerned she's still
> working on the current branch.

There are quite a few problems with "remember the starting position"
approach. For one, the rebase / fuse operation should be recorded as
whole in the branch reflog (assuming that you implement this feature).
Working on a branch would ordinarily mean that all those intermediate
steps would be recorded (well, they are, but in separate reflog, namely
HEAD reflog).

The second issue is that you wouldn't want for your partially done rebase
to be visible; for example, you would want for 'git push' to not include
partial work (which might get abandoned).

I suppose all this can be solved without user-visible detached HEAD...

I have one more comment and one more issue about the article in
general.

First, while the entry into a list of version control systems (or even
interfaces to them) is hard, among others because of network effects,
it should be much easier to try to come up with a GUI or IDE plugin
starting from the same principles. Also with GUI there is not much
problem if you don;t implement everything; users would just fall back
on command line of underlying version control system.

Second, I think at least some of the concepts phase would not be
possible when Git was starting to be created. At the beginning, we
didn't know much about how distributed version control systems would
be used. For example, the very useful "topic branch" workflow was
not even imagined. Mercurial, which was created in parallel and at
the same time as Git, started with "clone to create a new branch"
paradigm!  Unfortunately the curse of "worse is better" is often many
misfits in paid for lots of power.

Best regards,
-- 
Jakub Narębski