On Fri, Aug 16, 2024 at 12:45:17PM +0200, Patrick Steinhardt wrote: > Fix this bug by asking git-gc(1) to not detach when it is being invoked > via git-maintenance(1). Instead, git-maintenance(1) now respects a new > config "maintenance.autoDetach", the equivalent of "gc.autoDetach", and > detaches itself into the background when running as part of our auto > maintenance. This should continue to behave the same for all users which > use the git-gc(1) task, only. For others though, it means that we now > properly perform all tasks in the background. The default behaviour of > git-maintenance(1) when executed by the user does not change, it will > remain in the foreground unless they pass the `--detach` option. This patch seems to cause segfaults in t5616 when combined with the reftable backend. Try this: GIT_TEST_DEFAULT_REF_FORMAT=reftable ./t5616-partial-clone.sh --run=1-16 --stress which fails for me within a few runs. Bisecting leads to 98077d06b2 (run-command: fix detaching when running auto maintenance, 2024-08-16). It doesn't trigger with the files ref backend. Compiling with ASan gets me a stack trace like this: + git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 --refetch origin AddressSanitizer:DEADLYSIGNAL ================================================================= ==657994==ERROR: AddressSanitizer: SEGV on unknown address 0x7fa0f0ec6089 (pc 0x55f23e52ddf9 bp 0x7ffe7bfa1700 sp 0x7ffe7bfa1700 T0) ==657994==The signal is caused by a READ memory access. #0 0x55f23e52ddf9 in get_var_int reftable/record.c:29 #1 0x55f23e53295e in reftable_decode_keylen reftable/record.c:170 #2 0x55f23e532cc0 in reftable_decode_key reftable/record.c:194 #3 0x55f23e54e72e in block_iter_next reftable/block.c:398 #4 0x55f23e5573dc in table_iter_next_in_block reftable/reader.c:240 #5 0x55f23e5573dc in table_iter_next reftable/reader.c:355 #6 0x55f23e5573dc in table_iter_next reftable/reader.c:339 #7 0x55f23e551283 in merged_iter_advance_subiter reftable/merged.c:69 #8 0x55f23e55169e in merged_iter_next_entry reftable/merged.c:123 #9 0x55f23e55169e in merged_iter_next_void reftable/merged.c:172 #10 0x55f23e537625 in reftable_iterator_next_ref reftable/generic.c:175 #11 0x55f23e2cf9c6 in reftable_ref_iterator_advance refs/reftable-backend.c:464 #12 0x55f23e2d996e in ref_iterator_advance refs/iterator.c:13 #13 0x55f23e2d996e in do_for_each_ref_iterator refs/iterator.c:452 #14 0x55f23dca6767 in get_ref_map builtin/fetch.c:623 #15 0x55f23dca6767 in do_fetch builtin/fetch.c:1659 #16 0x55f23dca6767 in fetch_one builtin/fetch.c:2133 #17 0x55f23dca6767 in cmd_fetch builtin/fetch.c:2432 #18 0x55f23dba7764 in run_builtin git.c:484 #19 0x55f23dba7764 in handle_builtin git.c:741 #20 0x55f23dbab61e in run_argv git.c:805 #21 0x55f23dbab61e in cmd_main git.c:1000 #22 0x55f23dba4781 in main common-main.c:64 #23 0x7fa0f063fc89 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #24 0x7fa0f063fd44 in __libc_start_main_impl ../csu/libc-start.c:360 #25 0x55f23dba6ad0 in _start (git+0xadfad0) (BuildId: 803b2b7f59beb03d7849fb8294a8e2145dd4aa27) My guess based on what I'm seeing and what the patch does is that now maintenance from a previous command is running in the background while our foreground git-fetch runs, and it somehow confuses things (perhaps by trying to compact reftables or something?). So I think there are two problems: 1. The reftable code needs to be more robust against whatever race is happening. I didn't dig further, but I'm sure it would be possible to produce a coredump. 2. Having racy background maintenance doesn't seem great for test robustness. At the very least, it might subject us to the "rm" problems mentioned elsewhere, where we fail to clean up. Annotating individual "git gc" or "git maintenance" calls with an extra descriptor isn't too bad, but in this case it's all happening under the hood via fetch. Is it a potential problem for every script, then? If so, should we disable background detaching for all test repos, and then let the few that want to test it turn it back on? -Peff