Elijah Newren <newren@xxxxxxxxx> writes: > ... Here's what I think are the relevant points > (and yeah, it's lengthy): > > > The revision traversal machinery typically processes and returns all > children before any parent. fast-export needs to operate in the > reverse fashion, handling parents before any of their children in > order to build up the history starting from the root commit(s). This > would be a clear case where we could just use the revision traversal > machinery's "reverse" option to achieve this desired affect. > > However, this wasn't what the code did. It added its own array for > queuing. The obvious hand-rolled solution would be to just push all > the commits into the array and then traverse afterwards, but it didn't > quite do that either. It instead attempted to process anything it > could as soon as it could, and once it could, check whether it could > process anything that had been queued. As far as I can tell, this was > an effort to save a little memory in the case of multiple root commits > since it could process some commits before queueing all of them. This > involved some helper functions named has_unshown_parent() and > handle_tail(). For typical invocations of fast-export, this > alternative essentially amounted to a hand-rolled method of reversing > the commits -- it was a bunch of work to duplicate the revision > traversal machinery's "reverse" option. > > This hand-rolled reversing mechanism is actually somewhat difficult to > reason about. It takes some time to figure out how it ensures in > normal cases that it will actually process all traversed commits > (rather than just dropping some and not printing anything for them). > > And it turns out there are some cases where the code does drop commits > without handling them, and not even printing an error or warning for > the user. Due to the has_unshown_parent() checks, some commits could > be left in the array at the end of the "while...get_revision()" loop > which would be unprocessed. This could be triggered for example with > git fast-export main -- --first-parent > or non-sensical traversal rules such as > git fast-export main -- --grep=Merge --invert-grep > > While most traversals that don't include all parents should likely > trigger errors in fast-export (or at least require being used in > combination with --reference-excluded-parents), the --first-parent > traversal is at least reasonable and it'd be nice if it didn't just > drop commits. It'd also be nice to have a simpler "reverse traversal" > mechanism. Use the "reverse" option of the revision traversal > machinery to achieve both. The above is a very helpful and understandable explanation of what is going on. I am a bit puzzled by the very last part, though. By "It'd also be nice to have a simpler 'reverse traversal' mechanism", do you mean that the end users have need to control the direction the traversal goes (in other words, they use "git fast-export" for some thing, and "git fast-export --reverse" to achieve some other things)? Or do you just mean that we need to do a reverse traversal but that is already available in the revision traversal machinery, and not using it and rolling our own does not make sense? > Even for the non-sensical traversal flags like the --grep one above, > this would be an improvement. For example, in that case, the code > previously would have silently truncated history to only those commits > that do not have an ancestor containing "Merge" in their commit > message. After this code change, that case would would include all "would would" -> "would" > commits without "Merge" in their commit message -- but any commit that > previously had a "Merge"-mentioning parent would lose that parent > (likely resulting in many new root commits). While the new behavior > is still odd, it is at least understandable given that > --reference-excluded-parents is not the default. Nicely written.