Hi Han-Wen, On Thu, 30 Apr 2020, Han-Wen Nienhuys wrote: > On Thu, Apr 30, 2020 at 1:22 PM Derrick Stolee <stolee@xxxxxxxxx> wrote: > > > > Hello, > > > > Here is today's test coverage report. > > > > It appears that _all_ of the reftable code shows up in this report as > > untested. Perhaps that is the state of the world right now, or perhaps > > I need to initialize a GIT_TEST_ environment variable to ensure it runs? > > > In either way, do we have any test coverage of that massive contribution? > > Please take a look at the online report [1] for that part of the report, > > as I snipped it out of this email. > > (again in plain text) > > The reftable code is currently exercised in a small way by > t0031-reftable.sh Very small, and yet even the simple change I made had uncovered a segmentation fault that the unit tests you refer to below had not. Which makes me think even more than I already thought that it would have made a lot more sense to develop the reftable library inside of git.git instead of completely separate from it. That would also have staved off the many, many style issues that make this thing hard to review, not to mention the sheer size that threatens to dull any mind pouring over the diff. It is probably not a secret by now that these issues keep me from reviewing the code. I mean, I want the reftables feature to finally address the problem on Windows where branch names are sometimes treated as case sensitive, but then all of a sudden they are not (because of the file system backend). So there is motivation on my side to help the patches along. Yet, the way the patches are presented does not invite me to do so. In a private conversation I had recently, the situation was likened to having a neat home that you really like and take care of because you, well, care, and therefore you ask guests to take off their shoes, but one of the guests just ignores the local customs and walks in with their muddy shoes, right over your white carpet. It is a bit of a crass picture that was painted there because the situation is not _all_ that bad. Yet... it _is_ very unenticing to have the coding style stepped all over, as well as the convention to "tell a story with your patches and commit messages" not exactly followed even to the letter "d" (let alone "t"). I understand that it seemed easier to first implement a library in a familiar language (Go) and then "port" it (where the "port" part shows by using constructs that are uncommon in C). However, when I think about the potential users, there really are only two: Git and libgit2. And since libgit2 is relatively similar in origin to Git, I don't think it was the best idea to start with an abstraction layer over data types. It would have been a lot easier to build the code in git.git first, test the heck out of it, and then abstract it _just_ enough to let libgit2 essentially use a copy (just like it does with the libxdiff code). Speaking of testing: there is no doubt in my mind that _iff_ these patches make it into git.git, then the vast majority of bug fixes will flow _from_ git.git. And if that is indeed the case, it makes it very, very awkward to ask every contributor to go to _another_ project to get those fixes in there first, and only then have them trickle back to the actual user. In light of this, I wonder whether there is anything you could do to change the way you develop this patch series that would make it more in line with the code contributions the Git project enjoys? Or at least to tell a much more elegant story arc in the patch series and abide by the coding conventions? I could imagine that just throwing away all that data type abstraction and reusing what is in git.git, including all the helper functions, would go a long way of not only simplifying the structure of the patch series to allow for a meaningful review (in the current form, I don't think that _any_ human could possibly verify the correctness, as the complexity and the size is just too daunting and too overwhelming), but it would also avoid implementing redundant functionality, which would not only reduce the size, but would also let us rely on tried-and-tested implementation in git.git. Just to name an example, pretty much the entire `basics.c` could go away. To give you a concrete data point for tried-and-tested function: the `string_list_split()` function has a short and sweet implementation that has not had to be fixed in more than seven years. That is quite a track record. Using this function instead of `parse_names()` would give us the instant benefit of relying on something we do _not_ have to review for correctness. And there is the distinct possibility that using a `string_list` with an explicit length (instead of throwing it away in `parse_names()` and then painstakingly re-calculating it in `names_length()`) might actually improve performance, too. I firmly believe that this patch series, in particular the huge patch, can be transformed into something that looks a lot more like git.git code, is a lot smaller, and is a _lot_ easier to review. Without the confidence such a reviewable shape provides, I would actually not trust it enough to put it into the hands of end-users. > The upstream library has unittests, see > https://cs.bazel.build/search?q=r%3Areftable+f%3A_test.c&num=0, but I > believe Git doesn't do unittests? That is a misconception. We usually implement some test helper in `t/helper/` (or a new subcommand in one test helper) and run that as part of the test suite. And to repeat my point above: I don't think that I would have found a segfault _immediately_ after modifying t0031 in a rather trivial way if this library was developed _inside_ git.git rather than outside of it, with an incremental story accompanied by unit tests. At the end of the day, it matters more that the reftable works in Git than it matters that it passes the unit tests when compiled somewhere else. > I'll try to generate a coverage report for them. > > You can set GIT_TEST_REFTABLE to see coverage for reftable, but I'm > not sure how useful it is given that ~15% of the tests fail. 15%? That's a lot... I hope that that's mostly test cases that incorrectly assume that they can access refs directly on disk, in loose format. Ciao, Dscho