Hi Junio, On Fri, 2 Oct 2020, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes: > > >> This commit provides basic utility classes for the reftable library. > >> > >> Since the reftable library must compile standalone, there may be some overlap > >> with git-core utility functions. > > > > My position on this idea to duplicate functionality in order to somehow > > pretend that the reftable code is independent of Git's source code has not > > changed. > > The above may be sufficient between you and Han-Wen, and a few > selected others who still remember, but please do not forget that we > are collaborating in the open. It would help those who are learning > open source interactions by watching from sidelines, if you spelled > out what your position is or at least left a pointer to a previous > discussion. You're right, sorry. > FWIW, I think it is a mistake to try to make this directory so that > it can be lifted out of our code base and used independently, as it > has to create unnecessary friction to the code when used here. It > is not like other code that we are not their primary intended > audience and we simply "borrow" from them (e.g. xdiff/ & sha1dc/). > > The previous paragraph agrees with my guess of your position, but > perhaps you have something else in mind. I dunno. Yes, you were spot on. I am rather strongly opposed to introducing duplicated functionality. Rather than spending time on finding half a dozen links to conversations in the past, let me try to make the case afresh: I have a very concrete example of the undesirable consequences: the commit-graph breakage Stolee and I debugged yesterday, where a bug hid in `in_merge_bases_many()` for two years, hidden by the fact that there was _duplicate logic_ in `get_reachable_subset()`. Pretty much _all_ of the code in `reftable/` is _very_ new, has not seen any real-world testing, and as such cannot really be trusted, in particular the code that duplicates functionality already present in libgit.a. Note that I do not claim that Han-Wen cannot be trusted: to the contrary, I trust him very much (we have a history going all the way back to Lilypond, the musical typesetter). The code, however, cannot be trusted, and needs to be reviewed and tested in practice. The simple fact is that even the thorough review that the commit-graph patches received did not prevent the `min_generation`/`max_generation` bug in `in_merge_bases_many()` from entering the code base, and if we had not had _another_ function doing very, very similar things, that bug would most likely not have lingered for _this_ long. Likewise for the reimplementation of the convenient functionality like `strbuf`. This kind of duplication (in the case of `struct strbuf`, quite _literally_: there are now confusingly _two_ `strbuf.h` files that even try to implement the _exact same_ API) is _prone_ to lead to stale, or long undiscovered, bugs in one of the duplicated implementations. Which means that we're likely to introduce bugs with those new functions introduced in `reftable/`. For the reftable functionality, that has to be accepted. For functions that do the same as equivalents in libgit.a, we do not have to accept it. In any case, duplicated functionality is a maintenance burden, and especially in this case, an unnecessary one at that: unlike xdiff or compat/regex/, the reftable functionality is co-dependent with `libgit.a`. Yes, you can implement libreftable as a stand-alone library (including duplicated functionality from libgit.a, as I pointed out), but you _cannot use it independently of Git_. The reftable concerns _Git_ refs. This is not a compressor that could come in handy in a messenger app, or a diff engine that could potentially help with a semantic merging tool. It is code whose entire purpose is to manage boatloads of Git refs efficiently. Therefore, I see no convincing reason to insist on maintaining this code outside of git/git. It would make more sense to maintain parse-options or strbuf or t/test-lib.sh outside of git/git (because at least that functionality is independent enough of Git to be potentially useful to other projects) and we don't. We maintain them inside git/git, and for good reason: it makes development and maintenance a hell of a lot easier. And once we agree that reftable is intended to be so integral a part of Git that it should absolutely have its home inside git/git, all of that duplicated functionality (with almost totally untested implementations, at least when it comes to "battle testing in production") can totally evaporate and does not need to worry us any more. And then we can start to spend reviewers' time on the actual code revolving around the actual reftables rather than having to bother with reviewing, say, a close, but non-identical `strbuf` that might even have "borrowed" parts of the `strbuf` code and put it under a more permissive license (I don't know, and I don't want to know, that's part of the reason why I don't want to review the reftable patches in their current form). In other words, we can then finally start to actually review the reftable code on its merits of teaching Git a valuable new feature. > > Be that as it may, the CI build failures impacted my notifications' > > signal/noise ratio too much, so here goes (Junio, I would be delighted if > > you could apply this on top of your branch): > > Quite honestly, this close to the first preview release for 2.29, > I'd rather drop this round from 'seen' and expect an updated topic, > than piling fixup on top of fixups. Sounds good to me. Ciao, Dscho