On Fri, Dec 10, 2021 at 1:55 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > Elijah Newren <newren@xxxxxxxxx> writes: > > > ... Here's what I think are the relevant points > > (and yeah, it's lengthy): > > > > > > The revision traversal machinery typically processes and returns all > > children before any parent. fast-export needs to operate in the > > reverse fashion, handling parents before any of their children in > > order to build up the history starting from the root commit(s). This > > would be a clear case where we could just use the revision traversal > > machinery's "reverse" option to achieve this desired affect. > > > > However, this wasn't what the code did. It added its own array for > > queuing. The obvious hand-rolled solution would be to just push all > > the commits into the array and then traverse afterwards, but it didn't > > quite do that either. It instead attempted to process anything it > > could as soon as it could, and once it could, check whether it could > > process anything that had been queued. As far as I can tell, this was > > an effort to save a little memory in the case of multiple root commits > > since it could process some commits before queueing all of them. This > > involved some helper functions named has_unshown_parent() and > > handle_tail(). For typical invocations of fast-export, this > > alternative essentially amounted to a hand-rolled method of reversing > > the commits -- it was a bunch of work to duplicate the revision > > traversal machinery's "reverse" option. > > > > This hand-rolled reversing mechanism is actually somewhat difficult to > > reason about. It takes some time to figure out how it ensures in > > normal cases that it will actually process all traversed commits > > (rather than just dropping some and not printing anything for them). > > > > And it turns out there are some cases where the code does drop commits > > without handling them, and not even printing an error or warning for > > the user. Due to the has_unshown_parent() checks, some commits could > > be left in the array at the end of the "while...get_revision()" loop > > which would be unprocessed. This could be triggered for example with > > git fast-export main -- --first-parent > > or non-sensical traversal rules such as > > git fast-export main -- --grep=Merge --invert-grep > > > > While most traversals that don't include all parents should likely > > trigger errors in fast-export (or at least require being used in > > combination with --reference-excluded-parents), the --first-parent > > traversal is at least reasonable and it'd be nice if it didn't just > > drop commits. It'd also be nice to have a simpler "reverse traversal" > > mechanism. Use the "reverse" option of the revision traversal > > machinery to achieve both. > > The above is a very helpful and understandable explanation of what > is going on. I am a bit puzzled by the very last part, though. By > "It'd also be nice to have a simpler 'reverse traversal' mechanism", > do you mean that the end users have need to control the direction > the traversal goes (in other words, they use "git fast-export" for > some thing, and "git fast-export --reverse" to achieve some other > things)? Or do you just mean that we need to do a reverse traversal > but that is already available in the revision traversal machinery, > and not using it and rolling our own does not make sense? Sorry, yeah, I meant the latter. I do not think end users should have control of the direction. Perhaps if that was reworded to "...It'd also be nice for future readers of the code to have a simpler..." it'd be clearer? > > Even for the non-sensical traversal flags like the --grep one above, > > this would be an improvement. For example, in that case, the code > > previously would have silently truncated history to only those commits > > that do not have an ancestor containing "Merge" in their commit > > message. After this code change, that case would would include all > > "would would" -> "would" Good catch. > > commits without "Merge" in their commit message -- but any commit that > > previously had a "Merge"-mentioning parent would lose that parent > > (likely resulting in many new root commits). While the new behavior > > is still odd, it is at least understandable given that > > --reference-excluded-parents is not the default. > > Nicely written. Thanks. :-)