Hi Emily, On Mon, 16 Sep 2019, Emily Shaffer wrote: > Jonathan Tan, Jonathan Nieder, Josh Steadmon and I met on Friday to > talk about projects and we came up with a trimmed list; not sure what > more needs to be done to make them into fully-fledged proposals. Thank you for doing this! > For starter microprojects, we came up with: > > - cleanup a test script (although we need to identify particularly > which ones and what counts as "clean") > - moving doc from documentation/technical/api-* to comments in the > appropriate header instead > - teach a command which currently handles its own argv how to use > parse-options instead > - add a user.timezone option which Git can use if present rather than > checking system local time Nice projects, all. There are a couple more ideas on https://github.com/gitgitgadget/git/issues, they could probably use some tagging. > For the longer projects, we came up with a few more: > > - find places where we can pass in the_repository as arg instead of > using global the_repository Good project, if a bit boring ;-) Also, `the_index` is used a lot in `builtin/*.c`, still. > - convert sh/pl commands to C, including: > - git-submodules.sh I am of two minds there. But mostly, I am of the "friends don't let friends use submodules" camp, so I would not even want to mentor for this project: it just makes me shudder too much every time I have to work with/on submodules. Of course, if others are interested, I'd hardly object to turn this into a built-in. > - git-bisect.sh That would be my top recommendation, especially given how much effort Tanushree put in last winter to make this conversion to C so much more achievable than before. > - rebase --preserve-merges No. `rebase -p` is already deprecated in favor of `rebase -r` (which _is_ already built-in). I already have patches lined up to drop that rebase backend. Let's not waste effort on converting this script to C. > - add -i Please see PRs #170-#175 on https://github.com/gitgitgadget/git/pulls, and please do help by adding your review of #170 (which was already submitted as v4: https://public-inbox.org/git/pull.170.v4.git.gitgitgadget@xxxxxxxxx/ In other words: this project is well under way. In fact, Git for Windows users enjoy this as an opt-in already. > (We were afraid this might be too boring, though.) Converting shell/Perl scripts into built-in C never looks as much fun as open-ended projects with lots of playing around, but the advantage of the former is that they can be easily structured, offer a lot of opportunity for learning, and they are ultimately more rewarding because the goals are much better defined than many other projects'. Another script that would _really_ benefit from being converted to C: `mergetool`. Especially on Windows, where the over-use of spawned processes really hurts, it is awfully slow a command. To complete the list of sh/pl commands: - git-merge-octopus.sh - git-merge-one-file.sh - git-merge-resolve.sh These seem to be good candidates for conversion to built-ins. Their functionality is well-exercised in the test suite, their complexity is quite manageable, and there is no good reason that these should be scripted. The only slightly challenging aspect might be that `merge-one-file` is actually not a merge strategy, but it is used as helper to be passed to `git merge-index` via the `-o <helper>` option, which makes it slightly awkward to be implemented as a built-in. A better approach would therefore be to special-case this value in `merge-index` and execute the C code directly, without the detour of spawning a built-in. - git-difftool--helper.sh - git-mergetool--lib.sh These would be converted as part of making `mergetool` a built-in, I believe. - git-filter-branch.sh This one is in the process of being deprecated in favor of `git filter-repo` (which is an external tool), so I don't think there would be much use in wasting energy on trying to convert it to C. Especially given that it wants to call shell script snippets all over the place, and those shell script snippets are supposed to run in the same context, which might actually make it completely impossible to convert this to C at all. - git-legacy-stash.sh This will go away once the built-in stash is considered good enough. - git-instaweb.sh - git-request-pull.sh - git-send-email.perl - git-web--browse.sh I don't think that any of these should be converted. They are just too unimportant from a performance point of view, and obscure enough that even their portability issues don't matter too much. As to `send-email` in particular: I would not want anybody to drag in all the dependencies required to convert `send-email` to a built-in to begin with. - git-archimport.perl - git-cvsexportcommit.perl - git-cvsimport.perl - git-cvsserver.perl - git-quiltimport.sh - git-svn.perl These are all connectors of some sort to other version control software. It also feels like they become less and less important, as Git really takes over the world. At some stage, I think, it would make sense to push those scripts out into their own repositories, looking for new maintainers (and if none can be found, then there really is not enough need for them to begin with, and they can be archived). > - reduce/eliminate use of fetch_if_missing global > - create a better difftool/mergetool for format of choice (this one > ends up existing outside of the Git codebase, but still may be pretty > adjacent and big impact) > - training wheels/intro/tutorial mode? (We thought it may be useful to > make available a very basic "I just want to make a single PR and not > learn graph theory" mode, toggled by config switch) > - "did you mean?" for common use cases, e.g. commit with a dirty > working tree and no staged files - either offer a hint or offer a > prompt to continue ("Stage changed files and commit? [Y/n]") > - new `git partial-clone` command to interactively set a filter, > configure other partial clone settings > - add progress bars in various situations > - add a TUI to deal more easily with the mailing list. Jonathan Tan has > a strong idea of what this TUI would do... This one would also end up > external but adjacent to the Git codebase. I don't think that this would be a good project for anybody except people who are already really, really familiar with our mailing list-centric workflow. > - try and make progress towards running many tests from a single test > file in parallel - maybe this is too big, I'm not sure if we know how > many of our tests are order-dependent within a file for now... Another, potentially more rewarding, project would be to modernize our test suite framework, so that it is not based on Unix shell scripting, but on C instead. The fact that it is based on Unix shell scripting not only costs a lot of speed, especially on Windows, it also limits us quite a bit, and I am talking about a lot more than just the awkwardness of having to think about options of BSD vs GNU variants of common command-line tools. For example, many, many, if not all, test cases, spend the majority of their code on setting up specific scenarios. I don't know about you, but personally I have to dive into many of them when things fail (and I _dread_ the numbers 0021, 0025 and 3070, let me tell you) and I really have to say that most of that code is hard to follow and does not make it easy to form a mental model of what the code tries to accomplish. To address this, a while ago Thomas Rast started to use `fast-export`ed commit histories in test scripts (see e.g. `t/t3206/history.export`). I still find that this fails to make it easier for occasional readers to understand the ideas underlying the test cases. Another approach is to document heavily the ideas first, then use code to implement them. For example, t3430 starts with this: [...] Initial setup: -- B -- (first) / \ A - C - D - E - H (master) \ \ / \ F - G (second) \ Conflicting-G [...] test_commit A && git checkout -b first && test_commit B && git checkout master && test_commit C && test_commit D && git merge --no-commit B && test_tick && git commit -m E && git tag -m E E && git checkout -b second C && test_commit F && test_commit G && git checkout master && git merge --no-commit G && test_tick && git commit -m H && git tag -m H H && git checkout A && test_commit conflicting-G G.t [...] While this is _somewhat_ better than having only the code, I am still unhappy about it: this wall of `test_commit` lines interspersed with other commands is very hard to follow. If we were to (slowly) convert our test suite framework to C, we could change that. One idea would be to allow recreating commit history from something that looks like the output of `git log`, or even `git log --graph --oneline`, much like `git mktree` (which really should have been a test helper instead of a Git command, but I digress) takes something that looks like the output of `git ls-tree` and creates a tree object from it. Another thing that would be much easier if we moved more and more parts of the test suite framework to C: we could implement more powerful assertions, a lot more easily. For example, the trace output of a failed `test_i18ngrep` (or `mingw_test_cmp`!!!) could be made a lot more focused on what is going wrong than on cluttering the terminal window with almost useless lines which are tedious to sift through. Likewise, having a framework in C would make it a lot easier to improve debugging, e.g. by making test scripts "resumable" (guarded by an option, it could store a complete state, including a copy of the trash directory, before executing commands, which would allow "going back in time" and calling a failing command with a debugger, or with valgrind, or just seeing whether the command would still fail, i.e. whether the test case is flaky). Also, things like the code tracing via `-x` (which relies on Bash functionality in order to work properly, and which _still_ does not work as intended if your test case evaluates a lazy prereq that has not been evaluated before) could be "done right". In many ways, our current test suite seems to test Git's functionality as much as (core) contributors' abilities to implement test cases in Unix shell script, _correctly_, and maybe also contributors' patience. You could say that it tests for the wrong thing at least half of the time, by design. It might look like a somewhat less important project, but given that we exercise almost 150,000 test cases with every CI build, I think it does make sense to grind our axe for a while, so to say. Therefore, it might be a really good project to modernize our test suite. To take ideas from modern test frameworks such as Jest and try to bring them to C. Which means that new contributors would probably be better suited to work on this project than Git old-timers! And the really neat thing about this project is that it could be done incrementally. > It might make sense to only focus on scoping the ones we feel most > interested in. We came up with a pretty big list because we had some > other programs in mind, so I suppose it's not necessary to develop all > of them for this program. I don't find that list particularly big, to be honest ;-) Ciao, Dscho