I am sorry for sending my proposal late. This unfortunately forces this to basically be the final draft. If you have any advice on things I could change, I would still appreciate them. Thank you ## More Sparse Index Integrations ### Personal Info Name: Raghul Nanth Email: [nanth.raghul@xxxxxxxxx](mailto:nanth.raghul@xxxxxxxxx) Mobile No: (+91) 6382298677 Education: National Institute of Technology, Tiruchirapalli Major: Computer Science Github: [NanthR](https://github.com/NanthR) ### Project Synopsis I would like to work on the "More Sparse Index Integrations" from the [SOC 2023 Ideas page](https://git.github.io/SoC-2023-Ideas/), which aims to integrate "sparse-index" with the remaining git commands, wherever possible. The project is expected to be of medium difficulty, and take up approximately 175 to 350 hours. **Sparse-index** is a feature that reduces the working directoy's index (a Git structure where information about the files tracked by git is maintained), and allows it to work with "sparse-checkouts". This makes certain commands like "git-add" or "git-commit" faster to execute. **Sparse-checkout** is a feature that allows the user to restrict their working directory with only the files one is working on currently. This is useful when the user only has to modify a small subsection of a given project ### About me I have been involved in development for the better part of 3 years. I have worked on some medium sized projects, but have never really been involved with open-source. My participation in the Git community started fairly late, but in the little time I had, I have been able to understand the Git workflow, the internals of the project, commonly used functions and test setup. The documentation, especially the [MyFirstContribution.txt](https://github.com/git/git/blob/master/Documentation/MyFirstContribution.txt), [MyFirstObjectWalk.txt](https://github.com/git/git/blob/master/Documentation/MyFirstObjectWalk.txt) and [sparse-index.txt](https://github.com/git/git/blob/master/Documentation/technical/sparse-index.txt) helped me get up to speed on both the technical aspects I needed to understand and the prerequisites for contribution. I do anticipate my prior experience with the project will make the further progress a lot smoother. ### Community benefits Sparse-index aims to provide a better working experience for people working with large monorepos, and my work on this project could enable more commands that people use to work with sparse-index, thus enabling speedups in more of their workflow. ### Related Work - [Integration with "grep"](https://lore.kernel.org/git/20220817075633.217934-1-shaoxuan.yuan02@xxxxxxxxx/) - [Integration with "rm"](https://lore.kernel.org/git/20220803045118.1243087-1-shaoxuan.yuan02@xxxxxxxxx/) ### Patches (Current Work) - [describe: enable sparse-index](https://lore.kernel.org/git/pull.1480.v3.git.git.1680155957146.gitgitgadget@xxxxxxxxx/T/#m7bf44d073e179c5715946c00ce805fec23f64c19) - **Status**: In review - **Description**: Add sparse-index integration for describe. Add functional and performance tests for the same. - [diff-index: enable sparse-index](https://lore.kernel.org/git/20230403190538.361840-1-nanth.raghul@xxxxxxxxx/T/#u) - **Status**: WIP - **Description**: Add sparse-index integration for diff-index. Add functional and performance tests for the same. ### Plan The general idea of integration remains the all the involved commands, depending on how they are setup, and based on the current logic of the command involved, we might first have to alter the current logic before attempting to add sparse-index support. This is based on Shaoxuan Yuan's ideas. 1. Investigate the command's logic and modify it wherever necessary (Especially compatibility with sparse-checkouts). Add tests for the same. [7 - 10 days] 2. Disable the command_requires_full_index setting in the building and ensure that the current intended functionality is intact. 3. Add tests to t1092-sparse-checkout-compatibility.sh for the built-in. Verify functionality, and if the command interacts with the working tree, make sure to test both in-cone and out-of-cone behavior. 4. Add tests to ensure that the index is not expanded by the command. 5. Add performance tests to demonstrate speedup (p2000-sparse-operations.sh) [Points 2 - 5 should be able to be completed in approximately 8 - 15 days ] The first step will not be a necessity in all the commands and hence can be skipped if not necessary. Without extending the timeline, 4 - 5 integrations are expected to be completed by the end of the GSOC program period. ### Timeline I would be able to officially start the project as soon as the start of the Community Bonding Period (May 4). I am confident of this as I already have a basic understanding of the process I would need to follow to accomplish the goals of the project. Even though exact dates are hard, there should roughly be, from my estimation, **one sparse-index integration every 20ish days**, starting on May 4. Integration schedule: - git-diff-tree - git-write-tree - git-worktree - git-check-attr - git-am Depending on the underlying implementation functions that these commands use, they could have already been made sparse index aware. In that case, the time to integration reduces significantly, as the only additions would be to change the command_requires_full_index option and add tests. ### Availability I will be completely free during the months of May-June and I will be able to fully spend my time for the project. I would be starting work after that, so I would need to juggle my time between the two. I do not have the most free schedule, but I think I can manage my time enough to be able to contribute effectively to the project. I will be able to work on average 6 hours a week for 6 days a week during the initial period. Following that, my contributions during the weekdays will decrease, but I believe I can make up for that during the weekends. ### Post GSoC I understand the importance of having GSOC participants who continue to be part of the community after the project ends. And I do intend to honor that idea. Being part of open-source is always a cool idea, and having the opportunity to help with something as widespread as "git" will always be exciting to me. I also feel I will be able to learn and make some important contributions to the project with my continued participation. I would also ideally like to create blogs about the development of Git. The methods and systems Git uses for its development is not apparent to a new developer and giving a more gentler introduction could help more people get into the project. Thanks, Kind Regards, Raghul ### References - [Make your monorepo feel small with Git’s sparse index](https://github.blog/2021-11-10-make-your-monorepo-feel-small-with-gits-sparse-index/), Stolee - [Bring your monorepo down to size with sparse-checkout](https://github.blog/2020-01-17-bring-your-monorepo-down-to-size-with-sparse-checkout/), Stolee