On 1/19/2021 9:52 AM, Antonio Russo wrote: > On 1/18/21 6:24 PM, Junio C Hamano wrote: >> Antonio Russo <aerusso@xxxxxxxxxxx> writes: >> >>> As a side note, would this list be willing to look at patches that remove >>> the need to use revs->limited? Adding new features would be much easier if >>> we could restrict git to use streaming algorithms for these simplifications. >> >> Do you mean you will write new implementations of operations like >> "--simplify-merges", "--ancestory-path" and all its friends without >> the need for the "limited" bit? > > Yes. > >> Willing to look at? I do not know about others, but sure, it would >> make be extremely excited. >> >> You may be able to rewrite some other operations that implicitly >> require the limited bit into an incremental implementation, but I am >> skeptical that you can do "simplify-merges" in a streaming fashion >> in such a way that we'd dig a bit but not all the way down before >> making decisions on which commits should be included in the output >> and how their parenthood relationship should appear etc. I and >> Linus tried independently and we did not get that far in our >> attempts (note: there wasn't commit-graph back then). > > Yeah, it's a big task (but, it'd be useful work, rather than doing > another rewrite of my feature to make it work when revs->limited). > Each of the flags (simplify-merges, ancestry-path, etc.) is its own > little sub-project. > > I haven't thought about any one flag specifically, but the fact that > two complete code paths exist right now just seem like a nightmare. > I'm not necessarily interested in making the implementations faster, > but rather getting rid of the duplicated code path. It's definitely a long shot to remove the limited flag altogether, but a good start would be to remove sort_in_topological_order(). All of its logic is already re-implemented in the streaming version (see every use of "struct topo_walk_info" in revision.c). The streaming is designed to be faster in the presence of a commit-graph file, but it should still work correctly without one. Since all commits with have "generation number infinity," each phase of the walk will do the same as the limit_list() and walk all reachable commits before returning the first result. The only thing to ask is: what is the overhead of the streaming version over sort_in_topological_order()? Is there one? You'd need to do performance tests without a commit-graph file. This has long been on my own TODO list, but I haven't been able to prioritize it over other things. I'd be happy to review the change. It also would be a good way to familiarize yourself with the code and how we already do some streaming things, even when "extra" information is needed to accomplish that. Thanks, -Stolee