Re: [RFC][GSoC] Proposal v2: Move and improve reftable tests in the unit testing framework

Christian Couder <christian.couder@xxxxxxxxx> · Tue, 2 Apr 2024 12:01:59 +0200

On Fri, Mar 29, 2024 at 8:08 AM Chandra Pratap
<chandrapratap3519@xxxxxxxxx> wrote:
>
> Thanks for the feedback, Christian and Patrick! With your advice, I
> decided to split my original proposal into two to conform to what was
> suggested by the SoC 2024 Ideas page.
>
> This is the proposal for the reftable tests migration project. However,
> I am unsure of what would be a good project size for this project.
> I have quite a long summer vacation and don't really have any other
> plans other than GSoC as of now so I decided to go with large project
> size on GSoC's website but please let me know if another size would
> be more appropriate.

On our Idea page, we have "Expected Project Size: 175 hours or 350
hours" for this project. So it's fine for us if you pick one of those
(medium or large).

> Before GSoC
> -----------
>
> -----Synopsis-----
>
> A new unit testing framework was introduced to the Git mailing list last
> year with the aim of simplifying testing and improving maintainability
> by moving the current testing setup from shell scripts and helper files
> to a framework written wholly in C. The idea was accepted and merged
> into master on 09/11/2023. The choice of testing framework and the
> rationale behind the choice is thoroughly described in
> Documentation/technical/unit-tests.txt.
>
> This project aims to extend that work by moving the reftable tests from
> the current setup to the new unit testing framework and improving the
> tests themselves. The difficulty for the project should be medium
> and it should take somewhat around 175-350 hours.

It's fine to give information about the project that is already in the
Idea page (or even to just copy from that page), but we are more
interested to know how you approach the project, and how you want to
work on it. So if you give information that is already in the Idea
page, please say it clearly, so we can easily skip reading that if we
don't have much time.

As we prefer that you give information specific to your approach, I
think it would have been more interesting to say that you decided to
go with a "large" project size for example.

> In GSoC
> -------
>
> -----Background for reftable-----
>
> Git’s internals consist of mainly three objects: blobs, tree objects and
> commit objects. The blobs and tree objects are responsible for storing a
> repository’s content while the commit objects store information about
> commits in the repo and are responsible for capturing the repo’s
> history.

When explaining refs with as much detail, I think tag objects are
important too, as refs often point to them too. On the contrary, blobs
and tree objects are very rarely pointed to by refs. So it's a bit
strange that you don't talk about tag objects, but that you talk about
blobs and trees which are less important than tag objects in this
context.

> Every one of these objects can be accessed through a unique key
> generated by a SHA-256 (previously SHA-1) algorithm.

Actually both SHA-256 and SHA-1 are supported.

> To make life
> easier, instead of remembering the hash key for commit objects, we can
> assign a simple name to them, store these names in a file and use that
> file whenever we need access to the commits. These names are called
> ‘references’ or ‘refs’.
>
> Since a repository can contain a lot of commits and branches and hence,
> a lot of refs, Git used packed-refs to save space by storing unused refs
> in a single file. However, this arrangement doesn’t scale well in terms
> of both space and performance. This is where reftable comes in. A
> reftable file is a portable binary file format customized for storing
> references. Some objectives of reftable are:
> -  Sorted references enabling advanced scans like binary search.
> -  Near constant time lookup for any single reference.
> -  Efficient enumeration of an entire namespace like refs/tags/
> -  Combined reflog storage with ref storage for small transactions and
>    separate reflog storage for base refs and historical logs.
> -  Near constant time verification if an object name is referred to by at
>    least one reference.
>
> -----Plan-----
>
> The reftable tests are different from other tests in the test directory

Not sure which "test directory" you are talking about. I guess it's "t/".

> because they perform unit testing with the help of a custom test framework
> rather than the usual ‘helper file + shell script’ combination.

There are actually already unit tests in t/unit-tests which don't need
a shell script.

I think it would have been clearer if this paragraph was started with
something like "Compared to unit tests that use both a "t/*.sh" test
script in shell and a test-tool helper in "t/helper/", ..."

> Reftable tests do have a helper file and a shell script invoking the
> helper file, but rather than performing the tests, this combination is
> used to invoke tests defined in the reftable directory.

It looks like you are talking about the "reftable/*_test.c" files, but
I think it would be clearer if you spelled that out.

> The reftable directory consists of nine tests:

I think using "contains" instead of "consists" would be a bit better,
as there are other things than the nine tests in the "reftable/"
directory.

>
> •  basics test
> •  record test
> •  block test
> •  tree test
> •  pq test
> •  stack test
> •  merged test
> •  refname test
> •  read-write test

Yeah, this corresponds to the output of `ls reftable/*_test.c`.

> Each of these tests is written in C using a custom reftable testing
> framework defined by reftable/test_framework (also written in C).

Using "reftable/test_framework.{c,h}" would be a bit clearer as there
is no "reftable/test_framework" file.

> Since the reftable test framework is written in C like the unit testing
> framework, we can create a direct translation of the features mentioned
> above using the existing tools in the unit testing framework with the
> following plan:
>
> •  EXPECT_ERR(c): Can be replaced by check(!c) or check_int(c, “==”, 0).
>
> •  EXPECT_STREQ(a, b): Can be replaced by check_str(a, b).
>
> •  EXPECT(c): Can be replaced by check_int(), similar to EXPECT_ERR.
>    E.g. expect(a >= b) --> check_int(a, “>=”, b)
>
> •  RUN_TEST(f): Can be replaced by TEST(f(), “message explaining the test”).
>
> The information contained in the diagnostic messages of these macros is
> replicated in the unit testing framework by default. Any additional
> information can be displayed using the test_msg() functionality in the
> framework. The additional functions set_test_hash() and strbuf_add_void()
> may be moved to reftable/stack.c and reftable/refname.c respectively.
>
> The plan itself is basic and does need improvements, but using this plan,
> I have already created a working albeit primitive copy for two out of the
> nine tests (basics test and tree test) as can be seen here:
> (link: https://github.com/gitgitgadget/git/pull/1698)

Nice, but it looks like "unit-tests/bin/t-reftable-tree" sometimes
fails, perhaps because it's missing a call to test_done().

The rest of your proposal LGTM.

Thanks!