Re: [PATCH 0/2] Ensure unique worktree ids across repositories

shejialuo <shejialuo@xxxxxxxxx> · Fri, 29 Nov 2024 19:05:38 +0800

On Fri, Nov 29, 2024 at 02:44:24AM +0000, Caleb White wrote:
> The `es/worktree-repair-copied` topic added support for repairing a
> worktree from a copy scenario. I noted[1,2] that the topic added the
> ability for a repository to "take over" a worktree from another
> repository if the worktree_id matched a worktree inside the current
> repository which can happen if two repositories use the same worktree name.
> 

I somehow understand why we need to append a hash or a random number
into the current "id" field of the "struct worktree *". But I don't see
a _strong_ reason.

I think we need to figure out the following things:

    1. In what situation, there is a possibility that the user will
    repair the worktree from another repository.
    2. Why we need to hash to make sure the worktree is unique? From the
    expression, my intuitive way is that we need to distinguish whether
    the repository is the same.

> This series teaches Git to create worktrees with a unique suffix so
> that the worktree_id is unique across all repositories even if they have
> the same name. For example creating a worktree `develop` would look like:
> 
>     foo/
>     ├── .git/worktrees/develop-5445874156/
>     └── develop/
>     bar/
>     ├── .git/worktrees/develop-1549518426/
>     └── develop/
> 
> The actual worktree directory name is still `develop`, but the
> worktree_id is unique and prevents the "take over" scenario. The suffix
> is given by the `git_rand()` function, but I'm open to suggestions if
> there's a better random or hashing function to use.
> 

The actual worktree directory name is unchanged. But we have changed the
"worktree->id" and the git filesystem. Now, we will encounter much
trouble. The main reason is that we make the worktree name and worktree
id inconsistent. There are many tools which assume that worktree id is
the worktree name. In other words, there is no difference between the
worktree id and worktree name at current.

Let me give you an example.

The user could use "git update-ref" to update a ref from another ref.
So, a situation is that the user want to update(create) the
main-worktree ref from linked-worktree ref.

    ```sh
    git init repo && cd repo
    git commit --allow-empty -m initial
    git branch branch-1
    git worktree add ./worktree-1 branch-1
    (cd worktree-1 && git update-ref refs/worktree/branch-2 HEAD)
    ```
By the above operations, we will create a worktree-specified ref under
the ".git/worktrees/<worktree_id>/refs/worktree".

What if we want to this in the main worktree:

    ```sh
    git update-ref refs/heads/branch-3 \
        worktrees/worktree-1/refs/worktree/branch-2
    ```

So, with this patch, we make worktree-id not the same as worktree name.
If we do this. "git update-ref" cannot find the
".git/worktrees/worktree-1/refs/worktree/branch-2". This is because the
filesystem is changed to ".git/worktrees/worktree-1-<hash>/...".

If we use hash / random number to distinguish. We also need to change
the ref-related code to ignore the "-<hash>". It's impossible to let the
user type the extra hash / random number. However, this requires a lot
of effort.

So, I think we need a _strong_ reason to indicate that we must append
some chars into worktree id to do this.

Thanks,
Jialuo