Hello everyone, This is my proposal for the project "Move existing tests to a unit testing framework" in Google Summer of Code 2024. The web version can be viewed at: https://docs.google.com/document/d/1wmosedy-UKd_mhAAjccv1qETO1q00s-npYj2m6g-Hd0/edit?usp=sharing I appreciate any feedback on this proposal. Thanks. -- >8 -- Move existing tests to a unit testing framework Personal Information -------------------- Name: Ghanshyam Thakkar E-mail: shyamthakkar001@xxxxxxxxx Ph. No.: +91 7990974044 Education: L.D. College of Engineering, Gujarat, India Year: IV Degree: Bachelors in Electronics and Communications Engineering GitHub: https://github.com/spectre10 Time-zone: UTC +5:30 (IST) Overview -------- Initially Git used to only rely on shell scripts to test the functionality usually deemed to be internal (e.g. libraries, functions etc.) This was achieved through t/helper binaries which would take input via stdin and output through stdout and the test scripts would consume those outputs. This translation is costly due to the following reasons: - Each test script creates a new git repository which is many times not needed. See here[1] for example. - These scripts spawn processes every time they need a certain output. And spawning processes are expensive (especially on windows). - Difficult setups: sometimes the functions or the libraries we are testing need some special setup such as defining flags for custom behavior etc. which cannot be done through the shell scripts, hence we might forego testing those conditions. Moving to the unit testing framework would give us many benefits such as increased runtime performance, and therefore lower test duration and solves all of the above points. The choice of framework and the rationale behind it is thoroughly described in Documentation/technical/unit-tests.txt. Therefore, the goal of this project is to move existing unit tests from t/helper/* and its corresponding scripts to the new unit testing framework. And there has been work going on to do this by Outreachy intern Achu Luma. The expected project difficulty is medium and the expected project size would be 350 hours. Pre-GSoC -------- I initially got into Git’s codebase in December, 2023 and started my contribution journey in January, 2024. I worked on mostly what I found interesting and within my reach. Following is the list of contributions that I have made, in chronological order: - [PATCH v6 0/2] t7501: add tests for --include, --only, --signoff Status: merged into master Merge Commit: 2be9ccf23e0592b83b78e719574e580e79db0375 Description: This patch series added missing tests for --include, --only and --signoff options of git-commit. While working on this patch I accidentally reproduced a bug where in an untracked path provided to the commit command with -i would not error out. This was noticed by Junio. Therefore, this patch also included a TODO comment describing this bug. Mailing list thread: https://lore.kernel.org/git/20240109060417.1144647-2-shyamthakkar001@xxxxxxxxx - [PATCH v3 1/2] t0024: avoid losing exit status to pipes Status: merged into master Merge Commit: 40225ba8b494080d7f90dd5e129daa7b11c613d1 Description: Although this was not my first contribution, it was done to fulfill the microproject criteria from the list of micro-projects on Git Developer Pages. The first patch replaces the pipe to the redirection operator to store the output to a temporary file to avoid losing exit status of the git commands. And the second patch of the series fixes the style violations to adhere to the current standard described in CodingGuidelines. Mailing list thread: https://lore.kernel.org/git/20240118215407.8609-1-shyamthakkar001@xxxxxxxxx/ - [PATCH v6 0/2] add-patch: classify '@' as a synonym for 'HEAD' Status: merged into master Merge Commit: 65462776c2963b377968e580ace1590238bf79c1 Description: The first patch of this series removed the disparity of command line output of using ‘HEAD’ vs ‘@’ in ‘git reset/restore/checkout -p’. And the second patch removed the perl prerequisites in the test scripts which use the patch mode functionality. Mailing list thread: https://lore.kernel.org/git/20240128181202.986753-2-shyamthakkar001@xxxxxxxxx/ - [PATCH] restore: allow --staged on unborn branch Status: Discontinued Description: This patch added functionality to assume an empty tree when on an unborn branch to be able to use ‘restore –staged’. However, this was discontinued due to lack of community interest. In hindsight, I think that this would have been an acceptable patch, had I described it well. Mailing list thread: https://lore.kernel.org/git/20240206230357.1097505-2-shyamthakkar001@xxxxxxxxx/ - [PATCH] unit-tests: convert t/helper/test-oid-array.c to unit-tests Status: on hold until GSoC Description: This was used as a practice patch to the current project proposal. This will be used as a reference in the proposal ahead. Mailing list thread: https://lore.kernel.org/git/20240223193257.9222-1-shyamthakkar001@xxxxxxxxx/ - [PATCH v2] setup: remove unnecessary variable Status: Merged to master Merge Commit: 272fd9125aae215ece1c465a4b7d336cf81c3e62 Description: I initially proposed the TODO comment removed through this patch as a micro-project for my first patch. However, the TODO comment was a result of misunderstanding, therefore this patch did some minor code simplifications along with the removal of the TODO comment. Mailing list thread: https://lore.kernel.org/git/20240304151811.511780-1-shyamthakkar001@xxxxxxxxx/ - [PATCH 0/2] commit, add: error out when passing untracked path Status: WIP Description: This patch series would fix the potential bug discovered in the first patch that I sent (--include tests). Mailing list thread: https://lore.kernel.org/git/20240318155219.494206-2-shyamthakkar001@xxxxxxxxx/ - Helped a fellow contributor by reviewing their patches Thread links: - https://lore.kernel.org/git/CZG3HO25QLAG.3Q9V03SCO99HL@xxxxxxxxx/ - https://lore.kernel.org/git/CZHJPF604DV9.X0A0VX1AB7P8@xxxxxxxxx/ Proposed Project ---------------- ----What’s unit testing---- Unit testing is a method to test a specific unit of source code (i.e. a function, a module etc.). This is very helpful in catching all of the corner cases and possible scenarios. ----Steps to convert the existing unit tests to the new framework---- It can be thought of as a series of steps which are outlined below: - Identify a suitable candidate from t/helper directory. - Each t/helper unit test has a corresponding shell script which actually tests the output from the binary. Each test-case in that script would ideally become a separate function (and therefore a single test-case) in the migrated unit test. - Convert the test logic from the shell script to C. This would involve understanding the underlying library/functionality which is being tested. Subsequently, make functions which would be ideally be a single testcase. - These functions can be called via the TEST() macro in the cmd_main function which is the entry point to the test. - Each of these functions contains a set of {check(), check_int() etc.} macro functions which check the condition provided to them and if it fails, prints a message to the stdout. If even a single check fails the whole test-case is considered a failure. - Modify Makefile to add the reference of the new test to the UNIT_TEST_PROGRAMS variable. - The skeleton of the new unit test would look something like this: #include "test-lib.h" static void testcase_1() { ... check(actual == expected); ... } int cmd_main(int argc, const char **argv) { TEST(testcase_1(), "test description"); return test_done(); } - After the migration remove the references of t/helper/{old-test-helper}.{c, o} from the Makefile and from the t/helper/test-tool.{c, h}. And subsequently remove the shell script which used to consume the old helper binary. ----Additional Notes along with a Reference Implementation---- These tests usually contain duplicate logic with variations in input and output. In such cases, we can define macro functions to not duplicate such logic. I.e. in t-ctype, we can see how a custom macro is defined to avoid duplication of test logic, with variation only in the input being used. Also, some these existing tests are not worth migrating. For example, test-tool binaries which are also used outside of the corresponding shell script (i.e. used in other tests for setup purposes) would be low on the priority list of possible migrations. I have made a reference implementation of t-oid-array.c which migrates t/helper/test-oid-array.c and t/0064-oid-array.sh to the new framework, which can be viewed here[2]. Note that the failing tests are not related to my implementation and rather a known issue[3]. It may or may not to be good enough to be merged into master, however it sufficiently showcases my abilities to understand framework, the use of different macros and the ability to understand different internal libraries and data-structures such oidarray itself. This implementation was also reviewed by Christian Couder on the mailing list and I have made some adjustments according to the feedback. This implementation uses a macro to test the ordered enumeration of oidarray with different inputs. And also tests the lookup capabilities of oidarray with different queries and inputs with the help of a macro. There are also various helper functions defined to help with printing debug output, generating hexadecimal representations, etc. ----Previous Work---- TAP implementation along with t-basic and t-strbuf: https://github.com/git/git/commit/e137fe3b t-mempool: https://github.com/git/git/commit/6cbae640 t-ctype: https://github.com/git/git/commit/e875d451 t-prio-queue: https://github.com/git/git/commit/808b77e5 Achu Luma has been doing this migration as part of Outreachy internship. Their branches can be seen here[4]. ----Goal---- The end goal of this project would be to migrate as many unit tests as possible according to the steps outlined above. In doing so, also enhance some of the tests as needed to be extensible such as the ability to test different hash algorithms on a single oidarray. This enhancement is not yet implemented in the reference implementation due to the series which enables the use of different hash algorithms on oidarray not yet merged into master (in next though). Currently, there are 78 test-*.c files (not including test-tool.c) in t/helper/ directory according to the master branch. Of course, migrating all of them would be a difficult task in 350 hours, therefore I plan to migrate around 1 unit test per week during the Community Bonding and Coding Phase. This may fluctuate depending on the type of test. Some of those tests are already being worked on by Achu Luma. ----Timeline---- Pre-GSoC (Until May 1) - Continue to work on different things like solving the bug discovered in the first patch series that I sent. Be engaged in the community. Community Bonding (May 1 - May 26) - Talk with mentors and identify the tests which are suitable for migration. Learn the libraries which are used for those tests and read up on the API. Look at Achu Luma’s branches to further familiarize myself with the migration process. Start the migration early with the mentors’ permission. Phase I (May 27 - July 11) - Continue implementing the tests in the unit testing framework with the guidance of mentors and send out patches to the mailing list for community approval. Approximately 6-7 tests would be done/cooking for migration at the end of this Phase. Phase II (July 12 - Aug 18) - Continue the migration and talk with the mentors about the pace, quality of patches and seek feedback. And enjoy the process. Another 5-6 tests would be done/cooking at the end of this Phase. Final Week (Aug 19 - Aug 26) - Finish up any remaining tests. - Fix any bugs and some final touches. - Make a final report about the work accomplished and outline future work. ----Blogging---- I believe that writing about one’s work helps them track their progress from a bird’s eye view. And also helps them realize if they are making mistakes from a long term perspective. I plan to blog about the progress bi-weekly. Although I have not setup the blog yet, it will be at: https://spectre10.github.io ----Availability---- My current semester ends on 30th April and my exam/viva is tentatively on 7th May. After that I have freed my summer for GSoC, therefore I will be able to give a minimum of 35-40 hours per week. Post-GSoC --------- The Git Community has helped me very much and tolerated my beginner mistakes. I have grown as a developer since I started contributing to Git. I learned the importance of writing good commit messages, benefits of splitting up patches, effective programming style and good communication etiquettes. Therefore, I plan to be involved even after my GSoC journey ends. I am very much involved in the Git ecosystem and also contribute to another project in the Git ecosystem called gittuf[5]. Also post-gsoc, I am planning to help out in aggregating for Git Rev News, which I personally enjoy reading. My Appreciation --------------- I would like extend my thanks to all of the folks on the mailing list and Discord who have helped me in contributing to the Git Project including Junio, Christian Couder, Phillip Wood, Elijah Newren and many others. And I have also taken some references of Achu Luma’s work for this proposal, so thanks to her, too. Thanks & Regards, Ghanshyam References: [1], [2]: https://github.com/spectre10/git/commit/9bfbb22ced3c4d15a215a84aa40043a30dc1bac0 [3]: https://lore.kernel.org/git/20240318090848.GC602575@xxxxxxxxxxxxxxxxxxxxxxx/T/#mc32405849ac4265ad36670480709d744f1d9e495 [4]: https://github.com/achluma/git/branches [5]: https://github.com/gittuf/gittuf