On Sat, 2015-08-01 at 08:51 +0200, Michael Haggerty wrote: > On 08/01/2015 07:12 AM, Junio C Hamano wrote: > > On Fri, Jul 31, 2015 at 8:59 PM, Michael Haggerty <mhagger@xxxxxxxxxxxx> wrote: > >> > >> It seems to me that adding a new top-level "worktree-refs" directory is > >> pretty traumatic. Lots of people and tools will have made the assumption > >> that all "normal" references live under "refs/". > >> ... > >> It's all a bit frightening, frankly. > > > > I actually feel the prospect of pluggable ref backend more frightening, > > frankly ;-). These bisect refs are just like FETCH_HEAD and MERGE_HEAD, > > not about the primary purpose of the "repository" to grow the history of refs > > (branches), but about ephemeral pointers into the history used to help keep > > track of what is being done in the worktree upstairs. There is no need for > > these to be visible across worktrees. If we use the real refs that are grobal > > in the repository (as opposed to per-worktree ones), we would hit the backend > > databas with transactions to update these ephemeral things, which somehow > > makes me feel stupid. > > Hmm, ok, so you are thinking of a remote database with high latency. I > was thinking more of something like LMDB, with latency comparable to > filesystem storage. > > These worktree-specific references might be ephemeral, but they also > imply reachability, which means that they need to be visible at least > during object pruning. Moreover, if the references don't live in the > same database with the rest of the references, then we have to deal with > races due to updating references in different places without atomicity. > > The refs+object store is the most important thing for maintaining the > integrity of a repo and avoiding races. To me it seems easier to do so > if there is a single refs+objects store than if we have some references > over here on the file system, some over there in a LMDB, etc. So my gut > feeling is for the primary reference storage to be in a single reference > namespace that (at least in principle) can be stored in a single ACID > database. > > For each worktree, we could then create a different view of the > references by splicing parts of the full reference namespace together. > This could even be based on config settings so that we don't have to > hardcode information like "refs/bisect/* is worktree-specific" deep in > the references module. Suppose we could write > > [worktree.refs] > map = refs/worktrees/*: > map = refs/bisect/*:refs/worktrees/[worktree]/refs/bisect/* > > which would mean (a) hide the references under refs/worktrees", and (b) > make it look as if the references under > refs/worktrees/[worktree]/refs/bisect actually appear under refs/bisect > (where "[worktree]" is replaced with the current worktree's name). By > making these settings configurable, we allow other projects to define > their own worktree-specific reference namespaces too. > > The corresponding main repo might hide "refs/worktrees/*" but leave its > refs/bisect namespace exposed in the usual place. > > "git prune" would see the whole namespace as it really is so that it can > compute reachability correctly. I think making this configurable is (a) overkill and (b) dangerous. It's dangerous because the semantics of which refs are per-worktree is important to the correct operation of git, and allowing users to mess with it seems like a big mistake. Instead, we should figure out a simple scheme and define it globally. I think refs/worktree -> refs/worktrees/[worktree]/ would do fine as a fixed scheme, if we go that route. We would need two separate views of the refs hierarchy, though: one used by prune (and pack-refs) that is non-mapped (that is, includes per-worktree refs for each worktree), and one for general use that is mapped. Maybe this is just a flag to the ref traversal functions. But I'm not sure that this is really the right way to go. As I understand it, we don't presently do many transactions that include both pseudorefs or per-worktree refs and other refs. And we definitely don't want to move pseudorefs into the database since there's so much code that assumes they're files. Also, the vast majority of refs are common, rather than per-worktree. In fact, the only per-worktree refs I've seen mentioned so far are the bisect refs and NOTES_MERGE_REF and HEAD. Of these, only HEAD is needed for pruning. Are there more that I haven't thought of? So I'm not sure the gain from moving per-worktree refs into the database is that great. There are some downsides of moving per-worktree refs into the database: 1. More operations in one worktree can now contend with operations in another worktree for the database. LMDB only allows a single write transaction at a time. 2. The refs API would be more complicated: it would need to deal with remapped vs raw ref paths. Refs backends would need to have functions to prune per-worktree data when a worktree is destroyed. 4. We would still need to deal with pseudorefs, so there's still some missing transactional safety, and still the complication of dealing with files on the filesystem. Simply treating refs/worktree as per-worktree, while the rest of refs/ is not, would be a few dozen lines of code. The full remapping approach is likely to be a lot more. I've already got the lmdb backend working with something like this approach. If we decide on a complicated approach, I am likely to run out of time to work on pluggable backends. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html