A quick note: normally I'd wait longer for more feedback before sending out a new reroll, but due to some firefighting at $DAYJOB, I will not have time to focus on this series for the next several weeks, possibly up to a month. I wanted to get V5 out before then, but please understand that I will not be able to address new feedback for a while. This series provides two small Rust wrapper libraries around parts of Git: "libgit-sys", which exposes a few functions from libgit.a, and "libgit", which provides a more Rust-friendly interface to some of those functions. In addition to included unit tests, at $DAYJOB we have tested building JJ[1] with our library and used it to replace some of the libgit2-rs uses. [1] https://github.com/martinvonz/jj Please find V1 cover letter here: https://lore.kernel.org/git/cover.1723054623.git.steadmon@xxxxxxxxxx/ Known NEEDSWORK: * Investigate alternative methods of managing symbol visibility & renaming. * Figure out symbol versioning Changes in V5: * When building with INCLUDE_LIBGIT_RS defined, add "-fvisibility=hidden" to CFLAGS. This allows us to manage symbol visibility in libgitpub.a without causing `make all` to rebuild from scratch due to changing CFLAGS. * Avoid using c_int in the higher-level Rust API. * Remove libgitpub.a and intermediate files with `make clean`. Changes in V4: * Drop V3 patch #3, which added wrappers around repository initialization and config access. These are not well-libified, and they are not necessary for JJ's proof-of-concept use case, so let's avoid exporting them for now. * Set a minimum supported Rust version of 1.63. Autodetect whether our Rust version has c_int and c_char types; if not, define them ourselves. * When building libgitpub.a via build.rs, set DEVELOPER=1 to catch additional errors at build time. * In build.rs, use the make_cmd crate to portable select the correct invocation of GNU Make. * Follow naming standards for _alloc() and _free() functions. * Use String instead of CString in higher-level API. * Move libgit_configset_alloc() and libgit_configset_free() out of upstream Git, to the libgitpub shim library. * In libgitpub, initialize libgit_config_set structs in the _alloc() function rather than with a separate _init() function. * Remove unnecessary comments in libgit-sys showing where the wrapped functions were originally defined. * Fix clippy lint: don't reborrow configfile path references. * Various typo fixes and `cargo fmt` fixes. Changes in V3: * Renamed cgit-rs to libgit-rs and cgit-sys to libgit-sys * Makefile cleanup, particularly adding config.mak options that developers can set to run Rust builds and tests by default (Patch 6) * Provide testdata configs for unit tests * ConfigSet API now uses &Path instead of &str -- more ergonomic for Rust users to pass in and errors out if the path string isn't UTF-8 * Fixed unresolved dependency on libz in Cargo.toml Calvin Wan (2): libgit: add higher-level libgit crate Makefile: add option to build and test libgit-rs and libgit-rs-sys Josh Steadmon (3): common-main: split init and exit code into new files libgit-sys: introduce Rust wrapper for libgit.a libgit-sys: also export some config_set functions .gitignore | 2 + Makefile | 44 +++++++++ common-exit.c | 26 +++++ common-init.c | 63 ++++++++++++ common-init.h | 6 ++ common-main.c | 83 +--------------- contrib/libgit-rs/Cargo.lock | 77 +++++++++++++++ contrib/libgit-rs/Cargo.toml | 15 +++ contrib/libgit-rs/build.rs | 4 + contrib/libgit-rs/libgit-sys/Cargo.lock | 69 ++++++++++++++ contrib/libgit-rs/libgit-sys/Cargo.toml | 17 ++++ contrib/libgit-rs/libgit-sys/README.md | 15 +++ contrib/libgit-rs/libgit-sys/build.rs | 35 +++++++ .../libgit-sys/public_symbol_export.c | 50 ++++++++++ .../libgit-sys/public_symbol_export.h | 18 ++++ contrib/libgit-rs/libgit-sys/src/lib.rs | 79 +++++++++++++++ contrib/libgit-rs/src/lib.rs | 95 +++++++++++++++++++ contrib/libgit-rs/testdata/config1 | 2 + contrib/libgit-rs/testdata/config2 | 2 + contrib/libgit-rs/testdata/config3 | 2 + t/Makefile | 16 ++++ 21 files changed, 639 insertions(+), 81 deletions(-) create mode 100644 common-exit.c create mode 100644 common-init.c create mode 100644 common-init.h create mode 100644 contrib/libgit-rs/Cargo.lock create mode 100644 contrib/libgit-rs/Cargo.toml create mode 100644 contrib/libgit-rs/build.rs create mode 100644 contrib/libgit-rs/libgit-sys/Cargo.lock create mode 100644 contrib/libgit-rs/libgit-sys/Cargo.toml create mode 100644 contrib/libgit-rs/libgit-sys/README.md create mode 100644 contrib/libgit-rs/libgit-sys/build.rs create mode 100644 contrib/libgit-rs/libgit-sys/public_symbol_export.c create mode 100644 contrib/libgit-rs/libgit-sys/public_symbol_export.h create mode 100644 contrib/libgit-rs/libgit-sys/src/lib.rs create mode 100644 contrib/libgit-rs/src/lib.rs create mode 100644 contrib/libgit-rs/testdata/config1 create mode 100644 contrib/libgit-rs/testdata/config2 create mode 100644 contrib/libgit-rs/testdata/config3 Range-diff against v4: 4: 2ed503216f ! 1: 1ae14207f6 Makefile: add option to build and test libgit-rs and libgit-rs-sys @@ ## Metadata ## -Author: Calvin Wan <calvinwan@xxxxxxxxxx> +Author: Josh Steadmon <steadmon@xxxxxxxxxx> ## Commit message ## - Makefile: add option to build and test libgit-rs and libgit-rs-sys + common-main: split init and exit code into new files - Add libgitrs, libgitrs-sys, libgitrs-test, and libgitrs-sys-test targets - to their respective Makefiles so they can be built and tested without - having to run cargo build/test. + Currently, object files in libgit.a reference common_exit(), which is + contained in common-main.o. However, common-main.o also includes main(), + which references cmd_main() in git.o, which in turn depends on all the + builtin/*.o objects. - Add environment variable, INCLUDE_LIBGIT_RS, that when set, - automatically builds and tests libgit-rs and libgit-rs-sys when `make - all` is ran. + We would like to allow external users to link libgit.a without needing + to include so many extra objects. Enable this by splitting common_exit() + and check_bug_if_BUG() into a new file common-exit.c, and add + common-exit.o to LIB_OBJS so that these are included in libgit.a. - Signed-off-by: Calvin Wan <calvinwan@xxxxxxxxxx> + This split has previously been proposed ([1], [2]) to support fuzz tests + and unit tests by avoiding conflicting definitions for main(). However, + both of those issues were resolved by other methods of avoiding symbol + conflicts. Now we are trying to make libgit.a more self-contained, so + hopefully we can revisit this approach. + + Additionally, move the initialization code out of main() into a new + init_git() function in its own file. Include this in libgit.a as well, + so that external users can share our setup code without calling our + main(). + + [1] https://lore.kernel.org/git/Yp+wjCPhqieTku3X@xxxxxxxxxx/ + [2] https://lore.kernel.org/git/20230517-unit-tests-v2-v2-1-21b5b60f4b32@xxxxxxxxxx/ + + Signed-off-by: Josh Steadmon <steadmon@xxxxxxxxxx> ## Makefile ## -@@ Makefile: build-unit-tests: $(UNIT_TEST_PROGS) - unit-tests: $(UNIT_TEST_PROGS) t/helper/test-tool$X - $(MAKE) -C t/ unit-tests +@@ Makefile: LIB_OBJS += combine-diff.o + LIB_OBJS += commit-graph.o + LIB_OBJS += commit-reach.o + LIB_OBJS += commit.o ++LIB_OBJS += common-exit.o ++LIB_OBJS += common-init.o + LIB_OBJS += compat/nonblock.o + LIB_OBJS += compat/obstack.o + LIB_OBJS += compat/terminal.o + + ## common-exit.c (new) ## +@@ ++#include "git-compat-util.h" ++#include "trace2.h" ++ ++static void check_bug_if_BUG(void) ++{ ++ if (!bug_called_must_BUG) ++ return; ++ BUG("on exit(): had bug() call(s) in this process without explicit BUG_if_bug()"); ++} ++ ++/* We wrap exit() to call common_exit() in git-compat-util.h */ ++int common_exit(const char *file, int line, int code) ++{ ++ /* ++ * For non-POSIX systems: Take the lowest 8 bits of the "code" ++ * to e.g. turn -1 into 255. On a POSIX system this is ++ * redundant, see exit(3) and wait(2), but as it doesn't harm ++ * anything there we don't need to guard this with an "ifdef". ++ */ ++ code &= 0xff; ++ ++ check_bug_if_BUG(); ++ trace2_cmd_exit_fl(file, line, code); ++ ++ return code; ++} + + ## common-init.c (new) ## +@@ ++#define USE_THE_REPOSITORY_VARIABLE ++ ++#include "git-compat-util.h" ++#include "common-init.h" ++#include "exec-cmd.h" ++#include "gettext.h" ++#include "attr.h" ++#include "repository.h" ++#include "setup.h" ++#include "strbuf.h" ++#include "trace2.h" ++ ++/* ++ * Many parts of Git have subprograms communicate via pipe, expect the ++ * upstream of a pipe to die with SIGPIPE when the downstream of a ++ * pipe does not need to read all that is written. Some third-party ++ * programs that ignore or block SIGPIPE for their own reason forget ++ * to restore SIGPIPE handling to the default before spawning Git and ++ * break this carefully orchestrated machinery. ++ * ++ * Restore the way SIGPIPE is handled to default, which is what we ++ * expect. ++ */ ++static void restore_sigpipe_to_default(void) ++{ ++ sigset_t unblock; ++ ++ sigemptyset(&unblock); ++ sigaddset(&unblock, SIGPIPE); ++ sigprocmask(SIG_UNBLOCK, &unblock, NULL); ++ signal(SIGPIPE, SIG_DFL); ++} ++ ++void init_git(const char **argv) ++{ ++ struct strbuf tmp = STRBUF_INIT; ++ ++ trace2_initialize_clock(); ++ ++ /* ++ * Always open file descriptors 0/1/2 to avoid clobbering files ++ * in die(). It also avoids messing up when the pipes are dup'ed ++ * onto stdin/stdout/stderr in the child processes we spawn. ++ */ ++ sanitize_stdfds(); ++ restore_sigpipe_to_default(); ++ ++ git_resolve_executable_dir(argv[0]); ++ ++ setlocale(LC_CTYPE, ""); ++ git_setup_gettext(); ++ ++ initialize_repository(the_repository); ++ ++ attr_start(); ++ ++ trace2_initialize(); ++ trace2_cmd_start(argv); ++ trace2_collect_process_info(TRACE2_PROCESS_INFO_STARTUP); ++ ++ if (!strbuf_getcwd(&tmp)) ++ tmp_original_cwd = strbuf_detach(&tmp, NULL); ++} + + ## common-init.h (new) ## +@@ ++#ifndef COMMON_INIT_H ++#define COMMON_INIT_H ++ ++void init_git(const char **argv); ++ ++#endif /* COMMON_INIT_H */ + + ## common-main.c ## +@@ +-#define USE_THE_REPOSITORY_VARIABLE +- + #include "git-compat-util.h" +-#include "exec-cmd.h" +-#include "gettext.h" +-#include "attr.h" +-#include "repository.h" +-#include "setup.h" +-#include "strbuf.h" +-#include "trace2.h" +- +-/* +- * Many parts of Git have subprograms communicate via pipe, expect the +- * upstream of a pipe to die with SIGPIPE when the downstream of a +- * pipe does not need to read all that is written. Some third-party +- * programs that ignore or block SIGPIPE for their own reason forget +- * to restore SIGPIPE handling to the default before spawning Git and +- * break this carefully orchestrated machinery. +- * +- * Restore the way SIGPIPE is handled to default, which is what we +- * expect. +- */ +-static void restore_sigpipe_to_default(void) +-{ +- sigset_t unblock; +- +- sigemptyset(&unblock); +- sigaddset(&unblock, SIGPIPE); +- sigprocmask(SIG_UNBLOCK, &unblock, NULL); +- signal(SIGPIPE, SIG_DFL); +-} ++#include "common-init.h" -+.PHONY: libgitrs-sys -+libgitrs-sys: -+ $(QUIET)(\ -+ cd contrib/libgit-rs/libgit-sys && \ -+ cargo build \ -+ ) -+.PHONY: libgitrs -+libgitrs: -+ $(QUIET)(\ -+ cd contrib/libgit-rs && \ -+ cargo build \ -+ ) -+ifdef INCLUDE_LIBGIT_RS -+all:: libgitrs -+endif -+ - contrib/libgit-rs/libgit-sys/partial_symbol_export.o: contrib/libgit-rs/libgit-sys/public_symbol_export.o libgit.a reftable/libreftable.a xdiff/lib.a - $(LD) -r $^ -o $@ + int main(int argc, const char **argv) + { + int result; +- struct strbuf tmp = STRBUF_INIT; +- +- trace2_initialize_clock(); +- +- /* +- * Always open file descriptors 0/1/2 to avoid clobbering files +- * in die(). It also avoids messing up when the pipes are dup'ed +- * onto stdin/stdout/stderr in the child processes we spawn. +- */ +- sanitize_stdfds(); +- restore_sigpipe_to_default(); +- +- git_resolve_executable_dir(argv[0]); +- +- setlocale(LC_CTYPE, ""); +- git_setup_gettext(); +- +- initialize_repository(the_repository); +- +- attr_start(); +- +- trace2_initialize(); +- trace2_cmd_start(argv); +- trace2_collect_process_info(TRACE2_PROCESS_INFO_STARTUP); +- +- if (!strbuf_getcwd(&tmp)) +- tmp_original_cwd = strbuf_detach(&tmp, NULL); - - ## t/Makefile ## -@@ t/Makefile: perf: ++ init_git(argv); + result = cmd_main(argc, argv); - .PHONY: pre-clean $(T) aggregate-results clean valgrind perf \ - check-chainlint clean-chainlint test-chainlint $(UNIT_TESTS) -+ -+.PHONY: libgitrs-sys-test -+libgitrs-sys-test: -+ $(QUIET)(\ -+ cd ../contrib/libgit-rs/libgit-sys && \ -+ cargo test \ -+ ) -+.PHONY: libgitrs-test -+libgitrs-test: -+ $(QUIET)(\ -+ cd ../contrib/libgit-rs && \ -+ cargo test \ -+ ) -+ifdef INCLUDE_LIBGIT_RS -+all:: libgitrs-sys-test libgitrs-test -+endif + /* Not exit(3), but a wrapper calling our common_exit() */ + exit(result); + } +- +-static void check_bug_if_BUG(void) +-{ +- if (!bug_called_must_BUG) +- return; +- BUG("on exit(): had bug() call(s) in this process without explicit BUG_if_bug()"); +-} +- +-/* We wrap exit() to call common_exit() in git-compat-util.h */ +-int common_exit(const char *file, int line, int code) +-{ +- /* +- * For non-POSIX systems: Take the lowest 8 bits of the "code" +- * to e.g. turn -1 into 255. On a POSIX system this is +- * redundant, see exit(3) and wait(2), but as it doesn't harm +- * anything there we don't need to guard this with an "ifdef". +- */ +- code &= 0xff; +- +- check_bug_if_BUG(); +- trace2_cmd_exit_fl(file, line, code); +- +- return code; +-} 1: ed925d6a33 ! 2: 1ed010c378 libgit-sys: introduce Rust wrapper for libgit.a @@ Makefile: clean: profile-clean coverage-clean cocciclean $(MAKE) -C Documentation/ clean $(RM) Documentation/GIT-EXCLUDED-PROGRAMS + $(RM) -r contrib/libgit-rs/libgit-sys/target ++ $(RM) -r contrib/libgit-rs/libgit-sys/partial_symbol_export.o ++ $(RM) -r contrib/libgit-rs/libgit-sys/hidden_symbol_export.o ++ $(RM) -r contrib/libgit-rs/libgit-sys/libgitpub.a ifndef NO_PERL $(RM) -r perl/build/ endif 2: 8eeab7b418 = 3: 00762b57d0 libgit-sys: also export some config_set functions 3: 29599e9c7b ! 4: 4e5360931b libgit: add higher-level libgit crate @@ Metadata ## Commit message ## libgit: add higher-level libgit crate - Wrap `struct config_set` and a few of its associated functions in - libgit-sys. Also introduce a higher-level "libgit" crate which provides - a more Rust-friendly interface to config_set structs. + The C functions exported by libgit-sys do not provide an idiomatic Rust + interface. To make it easier to use these functions via Rust, add a + higher-level "libgit" crate, that wraps the lower-level configset API + with an interface that is more Rust-y. + + This combination of $X and $X-sys crates is a common pattern for FFI in + Rust, as documented in "The Cargo Book" [1]. + + [1] https://doc.rust-lang.org/cargo/reference/build-scripts.html#-sys-packages Co-authored-by: Josh Steadmon <steadmon@xxxxxxxxxx> Signed-off-by: Josh Steadmon <steadmon@xxxxxxxxxx> @@ Makefile: clean: profile-clean coverage-clean cocciclean $(RM) Documentation/GIT-EXCLUDED-PROGRAMS - $(RM) -r contrib/libgit-rs/libgit-sys/target + $(RM) -r contrib/libgit-rs/target contrib/libgit-rs/libgit-sys/target - ifndef NO_PERL - $(RM) -r perl/build/ - endif + $(RM) -r contrib/libgit-rs/libgit-sys/partial_symbol_export.o + $(RM) -r contrib/libgit-rs/libgit-sys/hidden_symbol_export.o + $(RM) -r contrib/libgit-rs/libgit-sys/libgitpub.a ## contrib/libgit-rs/Cargo.lock (new) ## @@ @@ contrib/libgit-rs/src/lib.rs (new) + } + } + -+ pub fn get_int(&mut self, key: &str) -> Option<c_int> { ++ pub fn get_int(&mut self, key: &str) -> Option<i32> { + let key = CString::new(key).expect("Couldn't convert to CString"); + let mut val: c_int = 0; + unsafe { @@ contrib/libgit-rs/src/lib.rs (new) + } + } + -+ Some(val) ++ Some(val.into()) + } + + pub fn get_string(&mut self, key: &str) -> Option<String> { -: ---------- > 5: 15ce989de8 Makefile: add option to build and test libgit-rs and libgit-rs-sys base-commit: 557ae147e6cdc9db121269b058c757ac5092f9c9 -- 2.47.0.rc1.288.g06298d1525-goog