Re: [PATCH 2/2] setup: don't fail if commondir reference is deleted.

Michal Suchánek <msuchanek@xxxxxxx> · Mon, 4 Mar 2019 14:30:02 +0100

Hello,

On Thu, 21 Feb 2019 17:27:04 +0000
Phillip Wood <phillip.wood@xxxxxxxxxxxx> wrote:

> Hi Eric
> 
> On 21/02/2019 17:12, Eric Sunshine wrote:
> > On Thu, Feb 21, 2019 at 12:07 PM Phillip Wood <phillip.wood@xxxxxxxxxxxx> wrote:  
> >> On 21/02/2019 13:50, Michal Suchánek wrote:  
> >>>> On Tue, Feb 19, 2019 at 12:05 AM Michal Suchanek <msuchanek@xxxxxxx> wrote:  
> >>> The problem is we don't forbid worktree names ending with ".lock".
> >>> Which means that if we start to forbid them now existing worktrees
> >>> might become inaccessible.  
> >>
> >> I think it is also racy as the renaming breaks the use of mkdir erroring
> >> out if the directory already exists. One solution is to have a lock
> >> entry in $GIT_COMMON_DIR/worktree-locks and make sure the code that
> >> iterates over the entries in $GIT_COMMON_DIR/worktrees skips any that
> >> have a corresponding ignores in $GIT_COMMON_DIR/worktree-locks. If the
> >> worktree-locks/<dir> is created before worktree/<dir> then it should be
> >> race free (you will have to remove the lock if the real entry cannot be
> >> created and then increment the counter and try again). Entries could
> >> also be locked on removal to prevent a race there.  
> > 
> > I wonder, though, how much this helps or hinders the use-case which
> > prompted this patch series in the first place; to wit, creating
> > hundreds or thousands of worktrees. Doing so serially was too slow, so
> > the many "git worktree add" invocations were instead run in parallel
> > (which led to "discovery" of race conditions). Using a global worktree
> > lock would serialize worktree creation, thus slowing it down once
> > again.  
> 
> The idea is that there are per-worktree lock stored under worktree-locks 
> (hence the plural name). Using a separate directory for the locks gets 
> round the problems of name clashes between the lock for a worktree 
> called foo and one called foo.lock and means we can rely on mkdir 
> erroring out if the worktree name already exists as there is no renaming.

I suppose this separate directory would work. When are you supposed to
take the lock, though?

When adding worktree, sure.

When managing worktrees, sure. Otherwise you would see the incomplete
worktrees.

When doing anything in git? Probably. Because otherwise you could
accidentally use the incomplete worktree. Or somebody deleting worktree
would fail removing it because you would keep adding files to it.

Isn't git supposed to allow parallel access to the repository?

As things stand if you wanted to implement worktree locking you would
need to lock the worktree for *every* operation that touches it, and
for many operations you would have to lock/unlock *all* worktrees one by
one to find the worktree you are supposed to work on.

I don't feel like adding locking to all of git to fix this problem.

Sure, adding enough locking to ensure repository consistency at all
times would be nice but it also needs to be granular enough to not harm
performance. I can't say I understand the git repository layout and
usage well enough to design that.

Thanks

Michal