On Wed, Jan 10, 2024 at 12:18 PM Taylor Blau <me@xxxxxxxxxxxx> wrote: > > Over the holiday break at the end of last year I spent some time > thinking on what it would take to introduce Rust into the Git project. I'm very happy to see this email. > There is significant work underway to introduce Rust into the Linux > kernel (see [1], [2]). Among their stated goals, I think there are a few > which could be potentially relevant to the Git project: > > - Lower risk of memory safety bugs, data races, memory leaks, etc. > thanks to the language's safety guarantees. > > - Easier to gain confidence when refactoring or introducing new code > in Rust (assuming little to no use of the language's `unsafe` > feature). > > - Contributing to Git becomes easier and accessible to a broader group > of programmers by relying on a more modern language. > > Given the allure of these benefits, I think it's at least worth > considering and discussing how Rust might make its way into Junio's > tree. I think there are other benefits as well; I'll list them at the end of the email to avoid side-tracking too much[6]. > I imagine that the transition state would involve some parts of the > project being built in C and calling into Rust code via FFI (and perhaps > vice-versa, with Rust code calling back into the existing C codebase). > Luckily for us, Rust's FFI provides a zero-cost abstraction [3], meaning > there is no performance impact when calling code from one language in > the other. I agree with the zero-cost abstraction, but there is a funny caveat with measuring it if anyone is curious[7]. > Some open questions from me, at least to get the discussion going are: > > 1. Platform support. The Rust compiler (rustc) does not enjoy the same > widespread availability that C compilers do. For instance, I > suspect that NonStop, AIX, Solaris, among others may not be > supported. > > One possible alternative is to have those platforms use a Rust > front-end for a compiler that they do support. The gccrs [4] > project would allow us to compile Rust anywhere where GCC is > available. The rustc_codegen_gcc [5] project uses GCC's libgccjit > API to target GCC from rustc itself. Another alternative (as discussed at Git Merge when we were last talking about Rust[8]), is requiring all Rust code to be optional for now. If we choose to go that route, I think that means that (a) for existing components, we have both a Rust and a C implementation available, and (b) for new components (e.g. new top-level commands like git-replay), they can be Rust-only and those compiling without Rust just don't get them. > 2. Migration. What parts of Git are easiest to convert to Rust? My > hunch is that the answer is any stand-alone libraries, like > strbuf.h. I'm not sure how we should identify these, though, and in > what order we would want to move them over. If we're happy to allow Rust, I'd like to rewrite git-replay in Rust as a testcase. It's almost certainly not "easiest", but I think it's an interesting testcase because it's a new top-level command that hasn't appeared in any release yet. Further, it is currently only designed for server-side usecases, so would likely not be affected by more limited platform support. (I haven't started on this; my previous experiments were with diffcore-delta.) > I'm curious to hear what others think about this. I think that this > would be an exciting and worthwhile direction for the project. Let's > see! :-) > > Thanks, > Taylor > > [1]: https://rust-for-linux.com/ > [2]: https://lore.kernel.org/rust-for-linux/20210414184604.23473-1-ojeda@xxxxxxxxxx/ > [3]: https://blog.rust-lang.org/2015/04/24/Rust-Once-Run-Everywhere.html#c-talking-to-rust > [4]: https://github.com/Rust-GCC/gccrs > [5]: https://github.com/rust-lang/rustc_codegen_gcc [6] Here are some additional benefits I see: - Parallel performance. We avoid making things parallel in Git because debugging/maintaining/reviewing parallel code in C often isn't worth the squeeze. Rust was designed to greatly reduce this effort (the whole "fearless concurrency" thing). - Single-threaded Performance. Multiple factors: - We had (and might still have) O(N^2) stuff in a lot of places in our codebase, because we tend to over-use arrays. (e.g. with string_list, or with insertions and deletions into the index during a merge, etc.) - Relatedly, using hashes in C is quite onerous, to the point that we often simply avoid it. I know I have, and I also know that even after I introduced strmap and tried to use it outside of merge-ort, that I got pushback because "string hash-maps are not really typical for a C program. I'm sure they are the best choice for an advanced merge algorithm but they are not really necessary [here; let's use sorted arrays instead]..." I then had to go through multiple rounds of responses and ended up reimplementing everything as suggested (before finally convincing others to just use the strmap implementation after all). - We use QSORT() which basically calls libc's qsort(). Due to the design of this function (where the comparator is a separate function call), it is slow. When languages avoid making the comparator a separate function call, they can speed sorts up by a factor of 2 (or even by 3 when an unstable sort is good enough and the platform's qsort() is stable). - Difficulty of incorporating other libraries. For example, our hashmap.[ch] make use of FNV, but picking something else is a big amount of effort. Now, while FNV is faster than Rust's default of SipHash, cargo makes it easy to pull in alternatives like FnvHashMap or FxHashMap, which we can then use where it matters. I'm also tempted to include bullet points for having a unit testing framework built in, and potentially fewer platform-dependent issues (e.g. forgetting to use STABLE_QSORT when required since qsort is stable in some libc implementations, since rust defines those more carefully to be consistent across platforms), but I'm not sure these additional advantages are big enough to merit a full bullet point. [7] If you ignore Rust for a moment, and simply divide your files into different libraries (e.g. introducing a new.c file, moving some functions to it, and then compiling new.c into a new library, libnew.a, and linking both libgit.a and libnew.a into git), you can sometimes measure some small performance differences. At least, I did. What this scenario has to do with Rust is that if we start moving some code to Rust, that will naturally likely result in a different division of files into libraries. Thus, for me to verify that Rust did provide zero-cost abstractions with my experiments, in order to compare the performance of my Rust changes, I had to compare to a version of git where I split some functions out into a separate library. When I did that, the performance overhead was actually 0. Otherwise, there was a tiny performance degradation in the particular splitting I employed. However, while splitting did give me a small performance drop, it was completely outweighed by the performance advantages I got elsewhere in the things I converted to Rust. [8] https://lore.kernel.org/git/ZRrfN2lbg14IOLiK@nand.local/