Re: [PATCH v2 05/13] reftable: utility functions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Junio,

On Fri, 2 Oct 2020, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes:
>
> >> This commit provides basic utility classes for the reftable library.
> >>
> >> Since the reftable library must compile standalone, there may be some overlap
> >> with git-core utility functions.
> >
> > My position on this idea to duplicate functionality in order to somehow
> > pretend that the reftable code is independent of Git's source code has not
> > changed.
>
> The above may be sufficient between you and Han-Wen, and a few
> selected others who still remember, but please do not forget that we
> are collaborating in the open.  It would help those who are learning
> open source interactions by watching from sidelines, if you spelled
> out what your position is or at least left a pointer to a previous
> discussion.

You're right, sorry.

> FWIW, I think it is a mistake to try to make this directory so that
> it can be lifted out of our code base and used independently, as it
> has to create unnecessary friction to the code when used here.  It
> is not like other code that we are not their primary intended
> audience and we simply "borrow" from them (e.g. xdiff/ & sha1dc/).
>
> The previous paragraph agrees with my guess of your position, but
> perhaps you have something else in mind.  I dunno.

Yes, you were spot on.

I am rather strongly opposed to introducing duplicated functionality.
Rather than spending time on finding half a dozen links to conversations
in the past, let me try to make the case afresh:

I have a very concrete example of the undesirable consequences: the
commit-graph breakage Stolee and I debugged yesterday, where a bug hid in
`in_merge_bases_many()` for two years, hidden by the fact that there was
_duplicate logic_ in `get_reachable_subset()`.

Pretty much _all_ of the code in `reftable/` is _very_ new, has not seen
any real-world testing, and as such cannot really be trusted, in
particular the code that duplicates functionality already present in
libgit.a. Note that I do not claim that Han-Wen cannot be trusted: to the
contrary, I trust him very much (we have a history going all the way back
to Lilypond, the musical typesetter). The code, however, cannot be
trusted, and needs to be reviewed and tested in practice.

The simple fact is that even the thorough review that the commit-graph
patches received did not prevent the `min_generation`/`max_generation` bug
in `in_merge_bases_many()` from entering the code base, and if we had not
had _another_ function doing very, very similar things, that bug would
most likely not have lingered for _this_ long.

Likewise for the reimplementation of the convenient functionality like
`strbuf`. This kind of duplication (in the case of `struct strbuf`, quite
_literally_: there are now confusingly _two_ `strbuf.h` files that even
try to implement the _exact same_ API) is _prone_ to lead to stale, or
long undiscovered, bugs in one of the duplicated implementations.

Which means that we're likely to introduce bugs with those new functions
introduced in `reftable/`. For the reftable functionality, that has to be
accepted. For functions that do the same as equivalents in libgit.a, we do
not have to accept it.

In any case, duplicated functionality is a maintenance burden, and
especially in this case, an unnecessary one at that: unlike xdiff or
compat/regex/, the reftable functionality is co-dependent with `libgit.a`.
Yes, you can implement libreftable as a stand-alone library (including
duplicated functionality from libgit.a, as I pointed out), but you _cannot
use it independently of Git_.

The reftable concerns _Git_ refs.

This is not a compressor that could come in handy in a messenger app, or a
diff engine that could potentially help with a semantic merging tool.

It is code whose entire purpose is to manage boatloads of Git refs
efficiently. Therefore, I see no convincing reason to insist on
maintaining this code outside of git/git.

It would make more sense to maintain parse-options or strbuf or
t/test-lib.sh outside of git/git (because at least that functionality is
independent enough of Git to be potentially useful to other projects) and
we don't. We maintain them inside git/git, and for good reason: it makes
development and maintenance a hell of a lot easier.

And once we agree that reftable is intended to be so integral a part of
Git that it should absolutely have its home inside git/git, all of that
duplicated functionality (with almost totally untested implementations, at
least when it comes to "battle testing in production") can totally
evaporate and does not need to worry us any more.

And then we can start to spend reviewers' time on the actual code
revolving around the actual reftables rather than having to bother with
reviewing, say, a close, but non-identical `strbuf` that might even have
"borrowed" parts of the `strbuf` code and put it under a more permissive
license (I don't know, and I don't want to know, that's part of the reason
why I don't want to review the reftable patches in their current form).

In other words, we can then finally start to actually review the reftable
code on its merits of teaching Git a valuable new feature.

> > Be that as it may, the CI build failures impacted my notifications'
> > signal/noise ratio too much, so here goes (Junio, I would be delighted if
> > you could apply this on top of your branch):
>
> Quite honestly, this close to the first preview release for 2.29,
> I'd rather drop this round from 'seen' and expect an updated topic,
> than piling fixup on top of fixups.

Sounds good to me.

Ciao,
Dscho




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux