On 2024-01-10 at 20:16:53, Taylor Blau wrote: > Over the holiday break at the end of last year I spent some time > thinking on what it would take to introduce Rust into the Git project. > > There is significant work underway to introduce Rust into the Linux > kernel (see [1], [2]). Among their stated goals, I think there are a few > which could be potentially relevant to the Git project: > > - Lower risk of memory safety bugs, data races, memory leaks, etc. > thanks to the language's safety guarantees. > > - Easier to gain confidence when refactoring or introducing new code > in Rust (assuming little to no use of the language's `unsafe` > feature). I agree with both of these points. We've found that making our code thread safe in Git is hard and it's much easier in Rust, because, for the most part, the code doesn't compile if it would have a data race. Unit tests are also easy and built-in, and I think that's a major advantage. We also get nice things for free, like sets, maps, lists, and a variety of other collections that are all type-safe. Error handling is also a huge benefit: we'll get typed errors with the ability to pass data back. > - Contributing to Git becomes easier and accessible to a broader group > of programmers by relying on a more modern language. I think this can't be understated. One of the biggest hurdles for people contributing is that our code requires expert knowledge of C. We do all sorts of weird things with pointer arithmetic that even I have trouble understanding, and I'd really appreciate not having to worry about memory leaks or freeing resources.[0] Rust has nice things like the Drop trait that make resource management easy. Rust is also a language that people _want_ to use. I really like it and would probably contribute more if Git were in it. I don't really want to write more C, and outside of Git, won't use it on more than a de minimis basis unless paid. I can confirm that, having partially ported our service that serves Git traffic to Rust from C (without the public having noticed), it's a much nicer environment to work in. I'm also much more efficient at making changes as well. > Given the allure of these benefits, I think it's at least worth > considering and discussing how Rust might make its way into Junio's > tree. A couple of things which I think are worth discussing are as follows: The Rust project emits a new release every six weeks and doesn't provide LTS versions. What versions of Rust are supported by crates vary widely, and we'll absolutely need to choose our dependencies wisely. We may also want to ask crate authors if they'll be willing to commit to our version policy before using them; oftentimes, that can work. The approach that I aim for is supporting the version of Rust in the latest Debian stable, plus the version in Debian's previous stable release until the latest stable has been out for a year. (Thus, if Debian 12 was released on 2023-06-10, then I'd support Rust 1.48, Debian 11's version, until 2024-06-10, and then support would move to 1.63, Debian 12's version.) This provides about three years of support for a compiler version, which I think is fair. Note that none of this means that we're dropping support for older systems; newer versions of Rust will be available for most targets, even often after OSes go end of life. We'll also probably need to continue to rely on some C libraries. For example, reqwest, the main Rust HTTP client, doesn't support any authentication other than Basic, and I assure you from my experience as the Git LFS maintainer, we don't want to implement things like NTLM and Kerberos on our own. libcurl is almost certainly going to continue to be a dependency, as will PCRE. The Rust regex crate doesn't support backreferences, and we've basically tied lots of our regexes to POSIX, so we'll need to either rely on PCRE or some call out to a POSIX-compatible interface. gettext is likely to be another issue, although its thread-safety is potentially a problem; we could try using the `tr` crate instead, which also provides a Rust-specific string ripper. > I imagine that the transition state would involve some parts of the > project being built in C and calling into Rust code via FFI (and perhaps > vice-versa, with Rust code calling back into the existing C codebase). > Luckily for us, Rust's FFI provides a zero-cost abstraction [3], meaning > there is no performance impact when calling code from one language in > the other. Moreover, there are even ways to generate Rust bindings for C code and C headers for Rust code automatically. (These are cbindgen and bindgen, respectively.) I've used both, and while it's clearly an FFI case, it's still very ergonomic. > Some open questions from me, at least to get the discussion going are: > > 1. Platform support. The Rust compiler (rustc) does not enjoy the same > widespread availability that C compilers do. For instance, I > suspect that NonStop, AIX, Solaris, among others may not be > supported. > > One possible alternative is to have those platforms use a Rust > front-end for a compiler that they do support. The gccrs [4] > project would allow us to compile Rust anywhere where GCC is > available. The rustc_codegen_gcc [5] project uses GCC's libgccjit > API to target GCC from rustc itself. I think this is probably the biggest stumbling point. I know GCC is highly portable and works on AIX, as well as virtually every architecture. gccrs is still incomplete, but I believe rustc_codegen_gcc is mature, and should be a viable option for most platforms. (Solaris is already supported on Rust[1].) My main concerns are with NonStop, since the Rust standard library requires threading and a CSPRNG (although that can definitely be RDRAND, and is for some targets). I seem to recall that neither GCC nor LLVM are present there, although I see no reason why GCC could not be ported (LLVM lacks support for ia64, I believe, which would make it a bigger lift) I suspect that if we go forward, though, a lot of the work for architecture support in Rust upstream will already have been done, since I'm pretty sure the Debian porters for architectures like alpha, hppa, and ia64 are going to want to continue to use Git. NetBSD porters may also have useful patches in pkgsrc. I am also very sympathetic to the difficulties of running on less common systems, having had a PowerPC Mac running Linux as my first laptop and several UltraSPARC machines. I have sent in numerous patches to a wide variety of code so that it works gracefully on lots of architectures, and I've also dealt with lots of broken software. I do, however, think it's up to the porters of an OS to keep it running and healthy, and that means making sure it has suitable compiler toolchains for building, including for modern, extremely popular languages like Rust and Go. I'm okay with dropping support for systems where nobody upstream wants to or is capable of maintaining that tooling. I actually feel that once Rust is running on a system, it's actually easier to write portable code, since you don't have alignment issues and endianness must be handled explicitly, and most safe Rust code just works out of the box. > 2. Migration. What parts of Git are easiest to convert to Rust? My > hunch is that the answer is any stand-alone libraries, like > strbuf.h. I'm not sure how we should identify these, though, and in > what order we would want to move them over. strbuf.h is tricky because it uses variadic arguments, which are not stable in Rust. My approach would be to start by getting the main function up and running, and then we can incrementally port things over. We could, for example, use the `sha256` crate for our SHA-256 code (which would also dynamically use accelerated hardware implementations where available). There are other things which are libraries which could well work, though. Porting over our hashmap implementation might be a thing to do, for example. The repository structure might also be a good idea, since that will allow us to write safe wrappers for its contents. > 3. Interaction with the lib-ification effort. There is lots of work > going on in an effort to lib-ify much of the Git codebase done by > Google. I'm not sure how this would interact with that effort, but > we should make sure that one isn't a blocker for the other. I think it's going to work together nicely. We can and should consider building a C library from Rust to expose a lot of what we write. Also, in my view, the biggest enemy to libification in our codebase is our copious and improvident use of globals. Mutating static variables in Rust is unsafe, so as part of the port, we'll need to get rid of them, which seems like a nice common goal. > I'm curious to hear what others think about this. I think that this > would be an exciting and worthwhile direction for the project. Let's > see! I'm very much in favour of this. I think I brought it up at the contributor's summit and it caught some attention, but I don't think it should be too controversial and it will offer us a lot of advantages. [0] And before people say, "Well, you just need to spend more time with C," I've been writing it since I was 10 and I think we can all agree that with the SHA-256 work I've spent plenty of time with it. [1] rustc --print target-list is a great way to see what's supported. -- brian m. carlson (he/him or they/them) Toronto, Ontario, CA
Attachment:
signature.asc
Description: PGP signature