On 3/7/2023 8:56 AM, Ævar Arnfjörð Bjarmason wrote: > > On Fri, Feb 10 2023, Derrick Stolee wrote: >> All this is to say, that I'd like to see this API start with the smallest >> possible surface area and with the simplest implementation, and then I'd >> be happy to contribute those algorithms within the API boundary while the >> CLI is handled independently. > > I hear your concern about leaving this open for optimization, and in > general I'd vehemently agree with it, except for needing to eventually > feed a command-line to setup_revisions(). The most-correct way to build this, with full optimizations, does not involve revisions.c at all, so this "eventually" is incorrect. It's only something to do for the "first" implementation, as a reference. In order to do the single-walk approach for every path simultaneously, we _must_ have full control of the commit walk. There was a time where we had done a single-walk approach by letting the revision machinery walk all commits that changed the base tree, then looked for changes to the contained paths. However, this results in _incorrect_ results because commits that would normally be ignored by the simplified history walk for "<dir>/<entry>" were not ignored by the simplified history walk for "<dir>/" and thus that algorithm presented _incorrect results_. For that reason, doing a single walk that outputs the blame-tree results for each path must have full control over which commits are walked and which paths could emit a change for those commits. This means we must not use revision.c as a base for full control. > Ideally the revision API would make what you're describing easy, but the > way it's currently implemented (and changing it would be a much larger > project) someone who'd like to pass structured options in the way you'd > describe will end up having to re-implement bug-for-bug compatible > versions of some subset of the option parsing in revision.c. The subset of option parsing is "a starting revision" and "a base tree" and _perhaps_ "is the diff recursive or not?" (and this last one isn't even in revision.c yet). That does not seem like using revision.c's parsing is actually helpful at all. > Isn't a way to get the best of both worlds to have a small snippet of > code that inspects the "struct rev_info" before & after > setup_revisions(), and which would only implement certain optimizations > if certain known options are provided, but not if any unknown ones are? > > That way those who'd like the faster happy path could use that subset of > options, while the general API would allow any revision options. We'd > then error() or BUG() out only if we fail to map our expected paths to > OIDs. This option requires examining the long and ever-growing list of options to struct rev_info which will take much more work than parsing a starting ref and a path from the command-line. > I think those are all good ways forward here, and I'd much prefer those > to having to re-implement or pull out subsets of the current option > parsing logic in revision.c. What do you think? I think you are skirting over the difficult part about upstreaming the blame-tree command, which is the biggest reason we have not done it in the past. The way it is implemented in our fork started with this "just parse args using revision.c" because that's the easiest way to implement the naive implementation, but we were able to make optimizations on top only because we had full control over the callers not using any other options. We would not have been able to make the assumptions that allowed those performance enhancements without that control. Actually building the interface in a way that guarantees the behavior will be stable and understood is not easy, but is worth doing well. Thanks, -Stolee