On Mon, Dec 30, 2024 at 03:24:02PM +0100, Patrick Steinhardt wrote: > Same as with the preceding commit, git-fast-import(1) is using the safe > variant to initialize a hashfile checkpoint. This leads to a segfault > when passing the checkpoint into the hashfile subsystem because it would > use the unsafe variants instead: > > ++ git --git-dir=R/.git fast-import --big-file-threshold=1 > AddressSanitizer:DEADLYSIGNAL > ================================================================= > ==577126==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000040 (pc 0x7ffff7a01a99 bp 0x5070000009c0 sp 0x7fffffff5b30 T0) > ==577126==The signal is caused by a READ memory access. > ==577126==Hint: address points to the zero page. > #0 0x7ffff7a01a99 in EVP_MD_CTX_copy_ex (/nix/store/h1ydpxkw9qhjdxjpic1pdc2nirggyy6f-openssl-3.3.2/lib/libcrypto.so.3+0x201a99) (BuildId: 41746a580d39075fc85e8c8065b6c07fb34e97d4) > #1 0x555555ddde56 in openssl_SHA1_Clone ../sha1/openssl.h:40:2 > #2 0x555555dce2fc in git_hash_sha1_clone_unsafe ../object-file.c:123:2 > #3 0x555555c2d5f8 in hashfile_checkpoint ../csum-file.c:211:2 > #4 0x5555559647d1 in stream_blob ../builtin/fast-import.c:1110:2 > #5 0x55555596247b in parse_and_store_blob ../builtin/fast-import.c:2031:3 > #6 0x555555967f91 in file_change_m ../builtin/fast-import.c:2408:5 > #7 0x55555595d8a2 in parse_new_commit ../builtin/fast-import.c:2768:4 > #8 0x55555595bb7a in cmd_fast_import ../builtin/fast-import.c:3614:4 > #9 0x555555b1f493 in run_builtin ../git.c:480:11 > #10 0x555555b1bfef in handle_builtin ../git.c:740:9 > #11 0x555555b1e6f4 in run_argv ../git.c:807:4 > #12 0x555555b1b87a in cmd_main ../git.c:947:19 > #13 0x5555561649e6 in main ../common-main.c:64:11 > #14 0x7ffff742a1fb in __libc_start_call_main (/nix/store/65h17wjrrlsj2rj540igylrx7fqcd6vq-glibc-2.40-36/lib/libc.so.6+0x2a1fb) (BuildId: bf320110569c8ec2425e9a0c5e4eb7e97f1fb6e4) > #15 0x7ffff742a2b8 in __libc_start_main@GLIBC_2.2.5 (/nix/store/65h17wjrrlsj2rj540igylrx7fqcd6vq-glibc-2.40-36/lib/libc.so.6+0x2a2b8) (BuildId: bf320110569c8ec2425e9a0c5e4eb7e97f1fb6e4) > #16 0x555555772c84 in _start (git+0x21ec84) Wow. This is a really good catch, and I'm embarrassed to have introduced the bug in the first place. If I can state back to you what I think is going on to make sure that we're on the same page... In fast-import, we initialize a hashfile_checkpoint struct using regular the_hash_algo callbacks, like the_hash_algo->init_fn(), the_hash_algo->update_fn(), etc. But we also have a hashfile, which will use the unsafe variants of these functions. If we're using a specialized unsafe variant, then calling hashfile_checkpoint() will crash. In your example, the crash happens when we try and call the clone_fn, but I think it could also happen with hashflush() when it tries to call update_fn(). I thought that my series here: https://lore.kernel.org/git/cover.1732130001.git.me@xxxxxxxxxxxx/ would have prevented this, but I don't think that it would have in its current state. A little more on that in a second... > --- > builtin/fast-import.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/builtin/fast-import.c b/builtin/fast-import.c > index 1fa2929a01b7dfee52b653248bba802884f6be6a..0f86392761abbe6acb217fef7f4fe7c3ff5ac1fa 100644 > --- a/builtin/fast-import.c > +++ b/builtin/fast-import.c > @@ -1106,7 +1106,7 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark) > || (pack_size + PACK_SIZE_THRESHOLD + len) < pack_size) > cycle_packfile(); > > - the_hash_algo->init_fn(&checkpoint.ctx); > + the_hash_algo->unsafe_init_fn(&checkpoint.ctx); This will obviously fix the issue at hand, but I don't think this is any less brittle than before. The hash function implementation here needs to agree with that used in the hashfile API. This change makes that happen, but only using side information that the hashfile API uses the unsafe variants. The series I mentioned above made the algop a property of the hashfile itself, and introduced a specialized "unsafe" variant that did not require using separate callbacks. So that on its own would not have saved us here (or in the other spot from the previous patch). But something on top like this would have: --- 8< --- diff --git a/builtin/fast-import.c b/builtin/fast-import.c index 76d5c20f14..89fc20d436 100644 --- a/builtin/fast-import.c +++ b/builtin/fast-import.c @@ -1101,7 +1101,7 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark) || (pack_size + PACK_SIZE_THRESHOLD + len) < pack_size) cycle_packfile(); - the_hash_algo->init_fn(&checkpoint.ctx); + pack_file->algop->init_fn(&checkpoint.ctx); hashfile_checkpoint(pack_file, &checkpoint); offset = checkpoint.offset; --- >8 --- (and an analogous change to the other spot). I think we should perhaps combine forces here. My ideal end-state is to have the unsafe_hash_algo() stuff land from my earlier series, then have these two fixes (adjusted to the new world order as above), and finally the Meson fixes after that. Does that seem like a plan to you? If so, I can put everything together and send it out (if you're OK with me forging your s-o-b). Thanks, Taylor