Re: [RFC][GSoC] Proposal: Move reftable and other tests to the unit testing framework

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 19, 2024 at 6:11 PM Chandra Pratap
<chandrapratap3519@xxxxxxxxx> wrote:
>
> This is my project proposal for the Google Summer of Code 2024 program.
> The document version of this proposal can be accessed through this link:
> https://shorturl.at/ijrTU
>
> ---------<8----------<8----------<8----------<8----------<8----------<8
>
> Personal Info
> -------------
>
> Full name: Chandra Pratap
> Preferred name: Chand
>
> E-mail: chandrapratap3519@xxxxxxxxx
> Phone: (+91)77618-24030
>
> Education: SV National Institute of Technology, Surat, India
> Year: Sophomore (2nd)
> Major: Mathematics
>
> GitHub: https://github.com/Chand-ra
>
> Before GSoC
> -----------
>
> -----Synopsis-----
>
> A new unit testing framework was introduced to the Git mailing list last
> year with the aim of simplifying testing and improving maintainability.
> The idea was accepted and merged into master on 09/11/2023. This project
> aims to extend that work by moving more tests from the current setup to
> the new unit testing framework.
>
> The SoC 2024 Ideas page (link: https://git.github.io/SoC-2019-Ideas/)
> mentions reftable unit tests migration as a separate project from the
> general unit test migration project, however, I propose migrating other
> tests alongside the reftable unit tests as a part of this proposal.

It means that if we select your proposal, we cannot select someone
else to work on either the "Move existing tests to a unit testing
framework" project or the "Convert reftable unit tests to use the unit
testing framework" project.

I am not sure but I think that, after migrating all the reftable unit
tests, I would prefer you working on other reftable related tasks
rather than on more unit test migrations.

> The reasoning behind this is explained further down.
> The difficulty for the project should be medium and it should take
> somewhat between 175 to 350 hours.
>
> -----Contributions-----
>
> • apply.c: make git apply respect core.fileMode settings
> -> Status: merged into master
> -> link: https://public-inbox.org/git/20231226233218.472054-1-gitster@xxxxxxxxx/

A link to (or the hash of) the commit that merged your patch into the
master branch would be nice.

> -> Description: When applying a patch that adds an executable file, git
> apply ignores the core.fileMode setting (core.fileMode specifies whether
> the executable bit on files in the working tree are to be honored or not)
> resulting in false warnings. Fix this by inferring the correct file mode
> from the existing index entry when core.filemode is set to false. Add a
> test case that verifies the change and prevents future regression.
>
> -> Remarks: This was the first patch I worked on as a contributor to Git.
> Served me as an essential intro lesson to the community’s working flow and
> general practices.
>
> • tests: Move t0009-prio-queue.sh to the unit testing framework
> -> Status: merged into master
> -> link: https://public-inbox.org/git/pull.1642.v4.git.1705865326185.gitgitgadget@xxxxxxxxx/

I don't think it's a good idea to submit patches related to a project
we propose before the GSoC actually starts. We might have been able to
detect this earlier if you added [GSoC] to the patch titles.

Also a link to (or the hash of) the commit that merged your patch into
the master branch would be nice.

> -> Description: t/t0009-prio-queue.sh along with t/helper/test-prio-queue.c
> unit test Git's implementation of a priority queue. Migrate the test
> over to the new unit testing framework to simplify debugging and reduce
> test run-time.
>
> -> Remarks: Perhaps the most relevant patch of all the ones mentioned
> here, this patch helped me understand the expectations and workflow for
> the work to be performed in this project.
>
> • write-or-die: make GIT_FLUSH a Boolean environment variable
> -> Status: merged into master
> -> link: https://public-inbox.org/git/pull.1628.v3.git.1704363617842.gitgitgadget@xxxxxxxxx/

Also a link to (or the hash of) the commit that merged your patch into
the master branch would be nice.

> -> Description: Among Git's environment variable, the ones marked as

s/variable/variables/

> "Boolean" accept values like Boolean configuration variables, i.e.
> values like 'yes', 'on', 'true' and positive numbers are taken as "on"
> and  values like 'no', 'off','false' are taken as "off". Make GIT_FLUSH
> accept more values besides '0' and '1' by turning it into a Boolean
> environment variable & update the related documentation.
>
> • sideband.c: remoye redundant NEEDSWORK tag

s/remoye/remove/

> -> Status: merged into master
> -> link: https://public-inbox.org/git/pull.1625.v4.git.1703750460527.gitgitgadget@xxxxxxxxx/

Also a link to (or the hash of) the commit that merged your patch into
the master branch would be nice.

> -> Description: Replace a misleading NEEDSWORK tag in sideband.c that
> reads ‘replace int with size_t’ with another comment explaining why it
> is fine to use ‘int’ and the replacement isn’t necessary.
>
> • make tig callable from PowerShell/Command Prompt
> -> Status: merged into main
> -> link: https://github.com/git-for-windows/MINGW-packages/pull/104
>
> -> Description: Tig is a text mode interface for Git that ships with the
> standard Git for Windows package but isn’t callable from PowerShell/
> Command Prompt by default. Fix this by updating the relevant Makefiles
> and resource scripts.
>
> • fix broken link on Git for Windows’ GitHub wiki
> -> Status: merged
> -> link: https://github.com/git-for-windows/git/wiki/Home/_history
>
> -> Remarks: A simple fix for a broken link that I stumbled upon while
> browsing Git for Windows’ wiki looking for some help with the patch
> mentioned just before this one.
>
> • t4129: prevent loss of exit codes due to the use of pipes
> -> Status: merged into master
> -> link: https://lore.kernel.org/git/20220311132141.1817-1-shaoxuan.yuan02@xxxxxxxxx/

The link shows a patch from 2022 by Shaoxuan Yuan. Are you sure this
is the right link?

> -> Description: Piping the output of git commands like git-ls-files to
> another command (grep in this case) in t4129 hides the exit code returned
> by these commands. Prevent this by storing the output of git-ls-files to
> a temporary file and then "grep-ping" from that file. Replace grep with
> test_grep as the latter is more verbose when it fails.
>
> • t9146: replace test -d/-f with appropriate test_path_is_* function
> -> Status: merged into master
> -> link: https://public-inbox.org/git/pull.1661.v3.git.1707933048210.gitgitgadget@xxxxxxxxx/

Also a link to (or the hash of) the commit that merged your patch into
the master branch would be nice.

> -> Description: The helper functions test_path_is_* provide better debugging
> information than test -d/-e/-f.
> Replace tests -d/-e/-f with their respective ‘test_path_is_foo’ calls.
>
> • regex: update relevant files in compat/regex
> -> Status: WIP
> -> link: https://github.com/gitgitgadget/git/pull/1641

This is a GitGitGadget patch. Please mention that. Same thing for some
other contributions above that are not part of Git.

> -> Description: The RegEx code in compat/regex has been vendored from
> gawk and was last updated in 2010. This may lead to performance issues
> like high CPU usage. Fix this by synchronizing the relevant files in
> compat/regex with the latest version from GNULib and then replaying any
> changes we made to gawk’s version on top of the new files.
>
> -> Remarks: When I started working on this patch, I thought it was an
> easy fix but the work turned out to be more involved than I anticipated.
> I had to seek help from the other community members, and we have made
> some good progress, but there is still a lot of cleaning and changes that
> need to be done. I haven’t found enough time to commit to this again,
> but it’s surely something that I want to get done soon.
>
> • tests: Move t0032-reftable-unittest.sh to the unit testing framework
> -> Status: WIP
> -> link: https://github.com/gitgitgadget/git/pull/1698
>
> -> Description: t/t0032-reftable-unittest.sh along with t/helper/test-reftable.c
> unit test Git’s reftable framework. Migrate the test over to the new
> unit testing framework to simplify debugging and reduce test run-time.
>
> -> Remarks: An infant version as of now, I tinkered with this after
> seeing the project list on 'Git SoC 2024 Ideas' page to get an idea of
> the kind of work that will be involved in this project.

It's Ok to tinker and to mention this. I hope it helped you write this proposal.

> -----Related Work-----
>
> Prior works about the idea have been performed by other community members
> and previous interns which form a good guiding path for my own approach.
> Some previous example work:
>
> i) Port helper/test-ctype.c to unit-tests/t-ctype.c
> -> link: https://lore.kernel.org/git/20240112102743.1440-1-ach.lumap@xxxxxxxxx/
>
> ii) Port test-sha256.c and test-sha1.c to unit-tests/t-hash.c
> -> link: https://lore.kernel.org/git/20240229054004.3807-2-ach.lumap@xxxxxxxxx/
>
> iii) Port helper/test-date.c to unit-tests/t-date.c
> -> link: https://lore.kernel.org/git/20240205162506.1835-2-ach.lumap@xxxxxxxxx/
>
> iv) Port test-strcmp-offset.c to unit-tests/t-strcmp-offset.c
> -> link: https://lore.kernel.org/git/20240310144819.4379-1-ach.lumap@xxxxxxxxx/
>
> v) Integrate a simple strbuf unit test with Git's Makefiles
> -> link: https://lore.kernel.org/git/20230517-unit-tests-v2-v2-4-21b5b60f4b32@xxxxxxxxxx/
>
> vi) t0080: turn t-basic unit test into a helper
> -> link: https://lore.kernel.org/git/a9f67ed703c8314f0f0507ffb83b503717b846b3.1705443632.git.steadmon@xxxxxxxxxx/

Thanks for mentioning these.

> In GSoC
> -------
>
> -----Plan-----
>
> -> Reftable tests:
>
> The reftable tests are different from other tests in the test directory
> because they perform unit testing with the help of a custom test framework
> rather than the usual ‘helper file + shell script’ combination.
> Reftable tests do have a helper file and a shell script invoking the
> helper file, but rather than performing the tests, this combination is
> used to invoke tests defined in the reftable directory.
>     The reftable directory consists of nine tests:
>
> •  basics test
> •  record test
> •  block test
> •  tree test
> •  pq test
> •  stack test
> •  merged test
> •  refname test
> •  read-write test
>
> Each of these tests is written in C using a custom reftable testing
> framework defined by reftable/test_framework (also written in C). The
> framework has four major features utilized in performing the tests:
>
> •  EXPECT_ERR(c): A function-like macro that takes as input an integer
> ‘c’ (generally the return value of a function call), compares it against
> 0 and spits an error message if equality doesn’t hold. The error message
> itself contains information about the file where this macro was used,
> the line in this file where the macro was called and the error code ‘c’
> causing the error.
>
> •  EXPECT_STREQ(a, b): A function-like macro that takes as input two
> strings ‘a’ and ‘b’, compares them for equality via strcmp() and throws an
> error if equality doesn’t hold. The error message thrown contains information
> regarding the file where this macro was invoked, the line in this
> file where the macro was called and the mismatched strings ‘a’ and ‘b’.
>
> •  EXPECT(c): A function-like macro that takes as input an integer ‘c’
> (generally the result of a Boolean expression like a == b) and throws an
> error message if c == 0. The error message is similar to EXPECT_ERR(c).
>
> •  RUN_TEST(f): A function-like macro that takes as input the name of a
> function ‘f’ (a test function that exercises a part of reftable’s code),
> prints to stdout the message ‘running f’ and then calls the function with f().
>
> Other than these, the framework consists of two additional functions,
> set_test_hash() and strbuf_add_void() which are used  exclusively in the
> stack tests and refname tests respectively.
>
> Since the reftable test framework is written in C like the unit testing
> framework, we can create a direct translation of the features mentioned
> above using the existing tools in the unit testing framework with the
> following plan:
>
> •  EXPECT_ERR(c): Can be replaced by check(!c) or check_int(c, “==”, 0).
>
> •  EXPECT_STREQ(a, b): Can be replaced by check_str(a, b).
>
> •  EXPECT(c): Can be replaced by check_int(), similar to EXPECT_ERR.
>    E.g. expect(a >= b) --> check_int(a, “>=”, b)
>
> •  RUN_TEST(f): Can be replaced by TEST(f(), “message explaining the test”).
>
> The information contained in the diagnostic messages of these macros is
> replicated in the unit testing framework by default. Any additional
> information can be displayed using the test_msg() functionality in the
> framework. The additional functions set_test_hash() and strbuf_add_void()
> may be moved to the source files for stack and refname respectively.

You mean "reftable/stack.c" and "reftable/refname.c"?


> The plan itself is basic and might need some modifications, but using
> the above plan, I have already created a working albeit primitive copy for
> two out of the nine tests (basics test and tree test) as can be seen here:
> (link: https://github.com/gitgitgadget/git/pull/1698)
>
> -> Other tests:
>
> As is already mentioned, the rest of the tests in the test directory use the
> combination of a helper file (written in C) and a shell script that invokes
> the said helper file. I will use my work from the patch
> ‘tests: Move t0009-prio-queue.sh to the unit testing framework’
> (link: https://public-inbox.org/git/pull.1642.v4.git.1705865326185.gitgitgadget@xxxxxxxxx/)
> to explain the steps involved in the porting of such tests:
>
> • Search for a suitable test to port: As Christian Couder mentioned in this
> mail (link: https://public-inbox.org/git/CAP8UFD22EpdBU8HJqFM+=75EBABOTf5a0q+KsbzLK+XTEGSkPw@xxxxxxxxxxxxxx/),
> there exists a subset of t/helper worth porting and we need some sort of
> classification to discern these.
>
> All helper files contain a cmd__foo() function which acts as the entry
> point for that helper tool. For example, the helper/test-prio-queue.c
> file contained cmd__prio_queue() which served as the entry point for the
> prio-queue tool. This function is then mapped to a different name by
> helper/test-tool.c which is used by the ‘*.sh’ files to perform tests.
> Continuing the prior example, cmd__prio_queue() had been mapped to
> “prio-queue” by test-tool.c and t0009-prio-queue.sh invoked it like
> “prio-queue 1 2 get 3 dump”.
>
> To classify what among t/helper should be targeted first in this project
> we can use something like ‘git grep foo’ (where foo is the alias for cmd__foo())
> to look at the instances where the helper tool is invoked. The ones appearing
> lesser in different test scripts are the ones most likely to be used solely
> for unit testing and should probably be targeted first.
> Utilising this strategy, I discovered that the ‘prio-queue’ tool was only
> used in t0009-prio-queue.sh and hence, was a good candidate for the unit
> testing framework.
>
> Note that this strategy is not full-proof and further investigation is
> absolutely required on a per-test basis, it is only meant to give an
> initial idea of what’s worth investigating.
>
> •  Create a new C test file in t/unit-tests: After finding a test appropriate
> for the migration efforts, we create a new ‘*.c’ file in t/unit-tests.
> The test file must be named appropriately to reflect the nature of the
> tests it is supposed to perform. Most of the times, replacing ‘tXXXX’
> with ‘t-‘ and ‘*.sh’ with ‘.c’ in the name of the test script suffices.
> E.g. t/t0009-prio-queue.sh turns to t/unit-tests/t-prio-queue.c. The
> new C file must #include “test-lib.h” (to be able to use the unit
> testing framework) and other necessary headers files.
>
> •  Move the code from the helper file: Since the helper files are written
> in C, this step is mostly a ‘copy-paste then rename’ job. Changes similar
> to the following also need to be made in the Makefile:
> -    TEST_BUILTINS_OBJS += test-prio-queue.o
> +    UNIT_TEST_PROGRAMS += t-prio-queue
>
> •  Translate the shell script: The trickiest part of the plan, since
> different test scripts perform various functions, and a direct translation
> of the scripts to C is not always optimal. Continuing the prior example,
> t0009-prio-queue.sh used a single pattern for testing, write expected
> output to a temporary file (named ‘expect’) -> feed input to the ‘prio-queue’
> helper tool -> dump its output to another temporary file (named ‘actual’)
> -> compare the two files (‘actual’ vs ‘expect’).
>
> In the first iteration of my prio-queue patch, I worked out a
> straightforward translation of this pattern in C. I stored the input in
> a string buffer, passed that buffer to the test function and stored its
> output in another buffer, and then called memcmp() on these two buffers.
> While this did prove to be a working copy, this work was found to be inadequate
> on the mailing list. Through the next several iterations, I reworked the
> logic several times, like comparing the input and output on-the-go rather
> than using buffers and replacing strings with macro definitions.
>
> The test scripts similarly perform other functions like checking for
> prerequisites, creating commits, initializing repositories, changing or
> creating directories and so forth, and custom logic is required in most
> of the cases of translating these, as seen above.
>
> •  Run the resulting test, correct any errors: It is rare for any migrated
> test to work correctly on the first run. This step involves resolving any
> compile/runtime errors arising from the test and making sure that at the
> very minimum, all the test-cases of the original test are replicated in the
> new work. Improvements upon the original can also be made, for example, the
> original t0009-prio-queue.sh did not exercise the reverse stack
> functionality of prio-queue, which I improved upon in unit-tests/t-prio-queue.
>
> •  Send the resulting patch to the mailing list, respond to the feedback:
> This step involves writing a meaningful commit message explaining each patch
> in the series. From my experience contributing to the Git project, I find it
> to be rare for any patch series to be accepted in the very first iteration.
> Feedback from the community is vital for the refinement of any patch and
> must be addressed by performing the suggested changes and sending the work
> back to the mailing list. This must be repeated until the work is merged
> into ‘seen’, ‘next’ and further down, ‘master’.
>
>
> Timeline
> --------
>
> I’m confident that I can start the project as early as the start of the
> Community Bonding Period (May 1 - 26). This is because I have read
> the related documentation and already have some experience with the idea.
> I believe I’ll be ready to get up to speed to work on the project by then.
> The exact time arrangement for each test is variable and hard to determine,
> but judging from the fact that it took me 3-4 days to complete the first
> iteration of the t-prio-queue work, here is a proposed migration schedule:
>
> • Reftable tests:
>
> If my current work from 'tests: Move t0032-reftable-unittest.sh to the unit
> testing framework' is found satisfactory, there are 7 tests left that need
> to be ported to the unit testing framework. Assuming it takes 2-3 days to
> port one test, I should be done with the first patch series for the reftable
> tests in about 2-3 weeks. From there, it’s a matter of responding to the
> feedback from the mailing list, which can deceptively take longer than expected.
> For instance, I had to continue polishing my t-prio-queue patch for about
> two weeks after the feedback on the first iteration.
>
> • Other tests:
>
> The time required to port these tests is highly variable and depends mostly
> upon the work required in translating the shell script. As mentioned
> previously, it took me 3-4 days to complete the first iteration of the
> test-prio-queue migration patch, and that was a short test with only about
> 50 or so lines of shell scripting and all the test cases following a single
> pattern. Considering all this, I believe it should be possible, on average,
> to migrate a new test in 5-8 days.
>
> Hence, it should be possible for me to migrate >=7 tests along with the
> reftable tests throughout the duration of this project.
>
> Availability
> ------------
>
> My summertime is reserved for GSoC, so I expect that I can work on a new
> test 5 days per week, 6-8 hours per day, that is 30-40 hours a week.
> On the weekends, I would like to solely work on the feedback from
> the mailing list and advance my [WIP] patches. Certainly, something
> unexpected may arise, but I will try my best to adhere to this time
> commitment and be always available through the community’s mailing list.
>
> Post GSoC & Closing Remarks
> ---------------------------
>
> When I first started contributing to the Git project in October of 2023,
> I had no idea about programmes like GSoC. I was advised by a senior of
> mine to contribute to open-source projects and hence, my aim of contribution
> was to apply what I had learnt in college to solve real-world problems
> and learn from more experienced peers. However, most of what I have
> contributed to Git has been trivial owing to my lack of skills and
> inexperience with the project.
>
> Seeing how I need to do an internship in summer, with GSoC, I hope to be
> able to dedicate this internship time and effort to a cool project like
> Git while simultaneously learning skills to be able to make more useful
> contributions in the future. It’s two birds with one stone. I would also
> like to keep working on this project to see it to completion post-GSoC
> and help mentor other newcomers get started with the Git project.


> -- After GSoC --
> ---------------------
>
> After GSoC I intend to be a part of the community and keep
> contributing to the git’s codebase. I eagerly want to learn a lot from
> the git community and enhance my skills. I also see myself making
> important contributions to git in the future.
>
> When I first joined the community 2 months ago, the ancient way of
> collaborating through a mailing list by sending diff patches was
> really puzzling (GitHub was the only means that I knew about for
> open-source collaboration). After sending a patch I got a little
> comfortable with it and started loving it.
>
> -- Closing remarks --
> ----------------------------
>
> I am really enthusiastic to contribute to git. I really want to learn
> a lot from my mentors and the whole git community. Everyone
> contributes using git, why not contribute to git :).
>
> In the end I really want to thank the whole community, especially
> Christian and Junio, for their valuable time in reviewing my patches
> and guiding me with the problems I faced.

Thanks for your proposal. Please make sure you submit it soon as a pdf
file to the GSoC website. It can then be updated by uploading a new
pdf until the April 2 1800 UTC deadline.





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux