[no subject]

David Taylor <dtaylor@xxxxxxx> · Wed, 12 Apr 2017 15:43:34 -0400

    On Tue, Apr 11, 2017 at 10:14 PM, taylor, david <David.Taylor@xxxxxxxx> wrote:
    > We are using Git in a distributed environment.
    >
    > In the United States, we have the master repository in one state and a build cluster in a different state.
    > In addition to people in the US doing builds, we have people in other countries (Ireland, India, Israel,
    > Russia, possibly others) doing builds -- using the build cluster.
    >
    > The local mirror of the repository is NFS accessible.  The plan is to make builds faster through the use
    > of work trees.  The build cluster nodes involved in the build will have a worktree in RAM -- checked out
    > for the duration of the build.  Since the worktree is in RAM, it will not be NFS accessible.
    >
    > [Cloning takes 20+ minutes when the network is unloaded.  Building, with sources NFS mounted, takes
    > 5-10 minutes.]

    Using worktrees in this scenario kinda defeats the distributed nature
    of git. Cloning takes long, yes. But what about just "git pull" (or
    "git fetch && git checkout -f" if you want to avoid merging)?

Merging isn't the issue.  Speed is an issue.  Repeatability is an issue.
Disk space is an issue.

If someone does a build on their desktop instead of using the build
cluster, it will take hours rather than minudes.  And if they are not in
Massachusetts, they probably don't have access to the controlled
toolchain that is used to do the builds.

Users may choose to share their repository amongst several work trees
rather than having lots of clones.  Their choice.  Such work trees would
be 'long lived'.

I was thinking of a different use for work trees.  Work trees that would
be short lived -- less than, say, two hours.  Typically less than 30
minutes.

There would be a local repository mirroring the master repository.  It
would be a true mirror -- it would be updated only from thhe master via
git fetch; there would never be any 'git commit's to it.

When someone who is remote to Massachusetts (which is where the build
cluster lives), wants to do a build they will invoke a script (yet to be
written) that will determine two things:

  . a SHA1 that exists in the master repository that is an ancestor of
  their workspace

  . their workspace differences relative to that SHA1

A lightly loaded build cluster node will be selected, it will be given
the SHA1 and the patch file.  A short lived worktree will be created
which has the SHA1 checked out and the patch applied.

Once the workspace has been recreated, it is built.

[Actually, the build target is decomposed into a dozen pieces that are
built on separate build cluster nodes in parallel using separate copies
of the workspace.]

The build deliverables will be delivered back to the requestor and the
intermediate build products and the work tree will be deleted.

[The script will also be used by Jenkins for continuous integration builds.]

    > This presents a few problems.
    >
    > When we are done with a work tree, we want to clean up -- think: prune.  But, you cannot prune just
    > one worktree; you have to prune the set.  And no machine has access to all the worktrees.  So, no
    > machine knows which ones are prunable.

    By "prune one worktree", did you mean delete one? Or delete a branch
    the worktree uses and prune the object database?

As in:

  rm -rf /path/to/top/of/work/tree

and then ideally:

  git worktree prune /path/to/top/of/work/tree

or, alternatively, just:

  git worktree prune

    > There is no 'lock' option to 'add'.  If someone does a 'prune' after you do an 'add' and before you do a
    > 'lock', then your 'add' is undone.
    >
    > Are there any plans to add a '[--lock]' option to 'add' to create the worktree in the locked state?  And/or
    > plans to add a [<path>...] option to prune to say 'prune only this path / these paths'?

    So this is "git worktree prune". Adding "worktree add --locked" sounds
    reasonable (and quite simple too, because "worktree add" does lock the
    worktree at creation time; we just need to stop it from releasing the
    lock). I might be able to do it quickly (it does not mean "available
    in the next release" though).

Yes, add the worktree, and if there are no errors, leave it in the
locked state.

    If you need to just prune "this path", I think it's the equivalent of
    "git worktree remove" (i.e. delete a specific worktree). Work has been
    going on for a while to add that command. Maybe it'll be available
    later this year.

I'm not familiar with 'git worktree remove', but, yes, just 'git
worktree prune' a specific specified path.  I was thinking do what 'git
worktree prune' does, but do it only for the path(s) specified on the
command line.

[While I only anticipate giving it one path, I see no need / reason to
limit it to just one path on the command line.]

    > If there are no plans, is the above an acceptable interface?  And if we implemented it, would it be looked
    > upon favorably?

    Speaking of this use case (and this is my own opinion) I think this is
    stretching "git worktree" too much. When I created it, I imagined this
    functionality to be used by a single person.

Cloning is expensive (20+ minutes, ~5 GB); checkout is cheap (seconds,
100's of MBs).  This is / was seen as a speedup.

I also noticed during my experiments that git sometimes looks at the
repository when I wouldn't expect it and gets an error if the directory
above the worktree does not exist.  These experiments were with 2.10.2;
I haven't tried yet with 2.12.2 nor with the head of the master branch.

I haven't (yet?) reported this as a bug, but what I did was:

. create a directory for the tests

  mkdir /home/dtaylor/worktree-tests

. create a couple of worktrees (neither locked)

  cd /top/of/repository
  git worktree add -f --detach /home/dtaylor/worktree-tests/tree_1 first-branch
  git worktree add -f --detach /home/dtaylor/worktree-tests/tree_2 second-branch

Different computer, where /top/of/repository is accessible (it is NFS
mounted), but /home/dtaylor/worktree-tests does not exist (/home is a
local partition, actual home directories go elsewhere), do:

  cd /top/of/repository
  git fetch ==> error message about /home/dtaylor/worktree-tests not existing
  git status ==> error message again

and then the real surprise (I wouldn't expect this one to access the
repository):

  git help ==> error message again

But, of course, if I do:

  cd /

(no repository in sight) and then do:

  git help

it works as expected.

I don't know if 2.12.2 or possibly HEAD fixes this.  My testing was done
with 2.10.2.  I was going to wait until I knew before reporting it.

    -- 
    Duy