Re: [PATCH v2 0/3] Ensure unique worktree ids across repositories

Caleb White <cdwhite3@xxxxx> · Tue, 03 Dec 2024 03:42:02 +0000

On Mon Dec 2, 2024 at 8:30 PM CST, Junio C Hamano wrote:
> <rsbecker@xxxxxxxxxxxxx> writes:
>
>>>Ah, yes, that exposes (and has to expose) the worktree ID.  It still does
>> not have to
>>>be unique across repositories (only has to unique among the worktrees that
>> share
>>>the same single repository).
>>
>> I might be mistaken, but I think the intent of the worktree series being
>> discussed
>> deliberately wanted the worktree ID to be globally unique on a specific
>> machine.
>
> That is my understanding, but I do not understand why such a
> uniqueness is needed.  Repositories are not even aware of other
> repositories, in any sense to make it matter to know worktree IDs
> other repositories are using.  Until there is an attempt to link a
> worktree that used to belong to a repository to a different
> repository, that is.  At that time, names must be made unique among
> worktrees that belong to the adopting repository, of course, but the
> names used in the original repository for its worktrees would not
> matter at that point, I would think.

Perhaps I should've have come up with a better series name, I think
there's been a lot of hang-up with the term "unique". When I refer to
uniqueness in this context, I'm not advocating for strict, absolute
uniqueness in the sense of ensuring no collisions under any conceivable
circumstance, or requiring that repositories are now aware of other
repositories. Instead, I'm discussing uniqueness from a practical
perspective: the combination of a random 32-bit integer from a CSPRNG
with a worktree name should be "unique" for all intents and purposes.
The theoretical risk of a collision does exist, of course, but the
probability is astronomically lower than the current approach, rendering
it effectively "unique" in practice.

You're correct in that the worktree ids are only relevant within the
context of a single repository. However, I've already demonstrated that
it's possible for a repository to "repair" (i.e., take over) a worktree
belonging to another repository if the ids match (inferred backlink).
In my experience, there's some pretty common names for worktrees (e.g.,
"main", "master", "develop", "hotfix", etc.), and it's not uncommon for
multiple repositories to have worktrees with the same name. This can be
avoided entirely by introducing some randomness into the worktree id and
significantly reducing the probability of a collision (e.g., one
repository would have a `develop-54678976` id while another would have
a `develop-987465246` id), which is the primary motivation behind this
series.

As I've mentioned earlier, the concept of a suffix is not new and should
not be a breaking change. It's already possible to have worktrees with
a different id from the public worktree directory name, so users and
scripts should not just assume them to be the same (this is buggy
behavior), but instead should be querying the worktree id from the `.git`
file or `git rev-parse --git-dir` if they really need it (very rare).
As part of this series I did add the worktree id to the `worktree list`
output to make it easier for scripts to query this information if they
do need it.

Perhaps this "take-over" scenario is not that big of a concern in
practice, I just noted that this behavior was made possible in the
`es/worktree-repair-copied` topic and I thought it was worth addressing.
If it's decided that this is not that big of a concern, then I suppose
this series can be dropped (although I've made some other QoL
improvements that may be useful).

Best,

Caleb