On Tue, Aug 02, 2011 at 10:28:10AM +0200, martin f krafft wrote: > TopGit does what you suggest (a parallel ref structure), but there > are three problems with this, which I am trying to address: > > 1. you need to ensure that these refs are pushed and fetched, > which requires set up and possible migration issues when things > change, and can cause big problems for contributors who just so > happened to forget. I agree that is an annoyance, but it is one we can deal with. In the near term, I wonder if a "tg clone" would be appropriate to add the extra fetch refspecs when cloning (or even a "tg init" inside an existing git repo -- I don't actually use topgit, so I'm not sure what the usual initialization process, if any, is). In the longer term, it might be nice if git was better at sharing third-party refs. The problem is that we don't know what the refs mean, so we don't know which ones are appropriate for sharing. Maybe we could do something like "refs/shared/topgit/*", and git by default would push and pull items under refs/shared? There have also been proposals to have a more mirror-like structure to what we fetch from remotes. E.g., to put remote refs/tags into refs/remotes/origin/refs/tags, and similar for notes. It may be that it is sensible for us to just fetch everything from a remote into refs/remotes, including unknown hierarchies like topgit. > 2. the additional refs confuse people a lot — and I can attest to > that because I have also at times found myself overwhelmed by > them when staring at gitk. Using "gitk --all", I assume? I agree it is annoying, though "gitk --branches" probably better specifies what you want (unless you stick the parallel ref structure under refs/heads above, which is also a solution to the "should it be fetched" plan). > 3. once a ref updates, we need to keep a pointer to the previous > location, since one of the goals is the ability to be able to > return to a point in history (e.g. for security updates to > a stable package, or backports). Additional refs enhance the > aforementioned two problems. Reflogs provide a linear history of the ref updates, but I suspect you want to be able to push and pull these histories. Which reflogs will not do. If you want to version the state of refs, then using raw refs isn't the right answer. You want a separate commit history with trees that map ref names to commits or other objects. Which is _almost_ what notes are; they map commit sha1s, but you want to map ref names. > Therefore I thought it would be sensible to store these data in > commit. When the data change, there will always be a new commit to > store these data, and we do *not* want to update the data in > previous commits. Finding the data then becomes backtracking the > branch history until a commit is found containing them. That seems to me like you are sticking information in a commit that is not actually about the commit, but about the ref that happens to point to the commit. What if I have two refs that point to the same commit, but with two different topgit bases? What about years later, when that information isn't interesting anymore? You're still carrying the cruft inside your commit objects. > > However, implementing such a thing would mean you have an awkward > > transition period where some versions of git think the referenced > > object is relevant, and others do not. That's something we can > > overcome, but it's going to require code in git, and possibly > > a dormant introduction period. > > Indeed. This could be adressed by letting a tool like TopGit require > a minimum version of Git. For a while, this will burden developers, > but ensure that it works. Over time, this will cease to be > a problem. Keep in mind that your requirement is not just a local thing. Object reachability is something that both sides of a transfer need to agree on. So imagine you use TopGit with a new version of git, and you push to a site like GitHub. The remote side will take your objects, but it will not send them back to anyone who fetches from your repository (since it has no idea they're relevant). And it will probably prune them after a week or two. > What do you think about using the idea of orphan parent commits > (OPC) for now? These are conceptually closest to the x-*-ref > pointers, do not require extra setup, pollute history only a little > bit (IMHO), and slot in with Git and fsck/gc alright. It doesn't seem like a good idea to me. Parent pointers have a well-defined meaning, and other parts of git (and other tools, even) are going to assume that's what your parent pointers mean. They are used in merge base calculations, for example. I _think_ you are mostly safe here, because your OPC wouldn't have any real history to it, so finding a merge base down that path would be fruitless. But consider something like "diff", which shows a merge commit differently than a regular commit. Your commits will unexpectedly appear as merges to git, and we will show a combined diff versus the OPC, which is going to be ugly. > I am not yet sure what information needs storing. Right now, I am > keeping five fields: > [...] Thanks, that helped with getting a sense of what you're doing. > I think there are two questions: > > 1. would x-*-ref be a suitable idea for Git core? > > I think the answer is yes, as (I think) it's well-defined and > I cannot see any problems with it, really. I think it's a nice idea for extensibility. And if it had been there from day one, there would be no problems. But now we have to deal with the transition period, and the fact that two different versions of git will have different ideas about the set of objects that are reachable from a given commit. > 2. can we prevent abuse? > > No, never. But just like you cannot abuse X-* headers in the > RFC822 format due to their design, x-*-ref abuse would only > affect those who chose it. I don't worry about abuse. You can already stick random cruft in a commit header, and you can already connect objects to a commit via tree entries. This idea is just giving git some rules for dealing with it. I'm still not 100% convinced you want per-commit storage, though, and not per-ref storage. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html