On Wed, Nov 21, 2012 at 6:08 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Jonathan Nieder <jrnieder@xxxxxxxxx> writes: > >> Never mind that others have said that that's not the current interface >> (I don't yet see why it would be a good interface after a transition, >> but maybe it would be). Still, hopefully that clarifies the intended >> meaning. > > Care to explain how the current interface is supposed to work, how > fast-export and transport-helper should interact with remote helpers > that adhere to the current interface, and how well/correctly the > current implementation of these pieces work? > > What I am trying to get at is to see where the problem lies. Felipe > sees bugs in the aggregated whole. Is the root cause of the problems > he sees some breakages in the current interface? Is the interface > designed right but the problem is that the implementation of the > transport-helper is buggy and driving fast-export incorrectly? Or is > the implementation of the fast-export buggy and emitting wrong results, > even though the transport-helper is driving fast-export correctly? > Something else? Let me give it a shot at explaining the case for remote helpers that use export/import. == listing == All operations begin with the transport helper requesting a list of refs. Basically 'git show-ref'. == fetching == In fetch mode the transport helper will initiate the process by requesting refs to the remote helper, like 'master', or 'devel', and so on. These refs were previously provided by the remote helper itself in the "listing" step. It is the total responsibility of the remote helper to decide what to do: nothing, only update the ref pointers, retrieve the whole repository, retrieve only the listed refs, etc. It's also the responsibility of the remote helper to keep track of marks, last known commits the refs pointed to, update local transitory repositories, etc. It's also the responsibility of the remote helper to throw the right 'feature' commands to fast-import for everything, including where to store the marks. Note that there are two sets of marks; the marks of the remote helper, which could be anything: JSON, text files, binary, etc. and don't contain git SHA-1's, and the git marks, which do contain git SHA-1's and are exported/imported by fast-import, but *both* are totally under control of the remote helper. At this point, git (transport helper), has absolutely no idea what's going on, the communication is completely between the remote helper and fast-import. After this process has finished, control goes back to the transport helper, which proceeds to check what fast-import did. Then, the result is shown to the user as the typical fetch that updated certain refs. == pushing == In this mode the roles are reversed, now git (transport helper) is in control, and everything that happens depends on what commands are passed to fast-export. Now the remote helper is a passive receiver of data, and has two options, receive it or die. Which refs get updated and how, is the total responsibility of transport helper. The only control the remote helper has, is before the export begins, in the configuration (capabilities command) that happens at the very beginning (before listing), and where it specifies features to support, which are then used to pass the relevant arguments to fast-export. And these capabilities are very limited: * import-marks * export-marks * refspec After the push has finished, the remote helper then proceeds to report which refs were actually updated, and the user gets notified. == details == As it should be obvious by now, there's not many ways in which a remote helper can screw things up (other than the parsing and generation of data for fast-import/export). The only tricky part is the refspec. To function properly, a remote helper should specify a refspec such as 'refs/heads/*:refs/test/heads/*', this way, all the changes a remote helper does are isolated in a specific refspec namespace, and the update to normal git happens in a controlled way. However, the refspec only makes sense in the *fetching* mode; the remote-helper is supposed to throw updates in the form of 'commit refs/test/heads/master', not 'commit refs/heads/master' (although in some case that might work, but I'm not sure which). But when pushing the remote helper will receive the refs in the normal form 'refs/heads/master'. Also, the namespaced refs are only updated when fetching, not when pushing. Marks are very straightforward; the same import and export marks should be specified for both importing and exporting. Everything works mostly fine as long as the remote helper follows this. Things break in all sorts of ways when it doesn't. But I want to emphasize again that there's not many ways in which a remote helper can screw things: marks, or refspec, that's it. *Specially* when pushing. == no marks == Let's imagine a very simple repository with 3 commits, which gets pushed to a remote one: 4e891f6 :3 d9d17c3 :2 e1aef7b :1 I'm obviously simplifying the marks, but essentially that's what fast-export would do when pushing commits to a remote helper; it the parent of :2 is :1, and the parent of :3 is :2, but the remote side *never* sees any git SHA-1, because they are not interesting in any way, there's nothing useful that can be done with them. The remote side would generate commits such as: :3 103 :2 102 :1 100 Again, for simplification purposes (you can picture them as mercurial revs). Now the push has finished. The marks are gone (no marks). What happens when you fetch? You might think that we will get only the commits after :3, but that's not the case, the transport helper would use 'refs/test/heads/master' to find out the last commit, but that doesn't get updated when pushing, only when fetching, so we would start from the top. 4e891f6 :3 d9d17c3 :2 e1aef7b :1 :3 103 :2 102 :1 100 The same will happen if you push, because push also uses 'refs/test/heads/master'. But *now* that we are doing a fetch, the 'refs/test/heads/master' pointer is updated to 4e891f6. But don't think that those marks are the same as the previous ones: they happen to be the same because they were generated the same way, but they are completely independent. What happens when you push now? Now the 'refs/test/heads/master' is pointing to 4e891f6, and suppose we have two new commits: 88764ee 4607106 4e891f6 :3 <- I'm putting these for reference, but in reality they are gone d9d17c3 :2 e1aef7b :1 The transport helper would do an export of '^refs/test/heads/master refs/heads/master', or '^4e891f6 88764ee'. And here comes the interesting part: What is the parent of 4607106? It's not :3, because that mark is gone, and in fact, even if we sent :3; things would break down because the other side has no idea what :3 means; it's gone, caput. What really happens is: 88764ee :2 4607106 :1 This is a new tree. That's exactly what you would expect if you do 'git fast-export ^v1.8.0^ master'; export all the commits as if v1.8.0 was the root. But in the context of remote helpers, that's not what we want. What can we do to fix this? Let's suppose that through some magic we get the parent of 4607106 to be 4e891f6; is that helpful? No. To the remote helper 4e891f6 is useless. What we need is 103, but without marks, we can't find that out. Maybe if we stored it in the last run? We need to parse the git marks, and then match our marks with those, and we could get a mapping like '4e891f6 -> 103', but what if the parent is 102? So, we need a mapping for all the marks, and then we have to store such mapping anyway. And guess what? We are back to using marks again! Except that instead of using the standard git way, we are using a custom hacky way. Are there other solutions? Maybe we can store the information in refs: 4e891f6 refs/test/ids/103 d9d17c3 refs/test/ids/102 e1aef7b refs/test/ids/100 But that also would require parsing the git marks, and going outside of the intended fast-export tool would kind of defeat the purpose of being fast an efficient, and still very hacky. And lets not even go as to what would be needed for 'git fast-export' to actually generate 4e891f6 in the first place, as that would probably require changes that would break other things. So no, you can't do it without marks. And why are we even discussing about this? Why would anybody want to avoid marks? Not only there's no other ways to achieve the same, marks are cheap and efficient, as efficient as any other solution could be, and then some. And do we have any real remote helpers that try to do export/import without marks? No, heck, we don't even have fake ones. It just doesn't work. Seriously. And my patches actually make it work: if there are no marks, then _everything_ is pushed. I don't see the point of supporting the functionality of no marks, clearly nobody is using that because it just doesn't work. Nobody has shown a shred of evidence to the contrary. With my patches, at least we try to do something without failing too miserably. Cheers. -- Felipe Contreras -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html