Patrick Steinhardt <ps@xxxxxx> writes: > It was recently reported that concurrent reads and writes may cause the > reftable backend to segfault. The root cause of this is that we do not > properly keep track of reftable readers across reloads. > > Suppose that you have a reftable iterator and then decide to reload the > stack while iterating through the iterator. When the stack has been > rewritten since we have created the iterator, then we would end up > discarding a subset of readers that may still be in use by the iterator. > The consequence is that we now try to reference deallocated memory, > which of course segfaults. > > One way to trigger this is in t5616, where some background maintenance > jobs have been leaking from one test into another. This leads to stack > traces like the following one: > > + git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 --refetch origin > AddressSanitizer:DEADLYSIGNAL > ================================================================= > ==657994==ERROR: AddressSanitizer: SEGV on unknown address 0x7fa0f0ec6089 (pc 0x55f23e52ddf9 bp > 0x7ffe7bfa1700 sp 0x7ffe7bfa1700 T0) > ==657994==The signal is caused by a READ memory access. > #0 0x55f23e52ddf9 in get_var_int reftable/record.c:29 > #1 0x55f23e53295e in reftable_decode_keylen reftable/record.c:170 > #2 0x55f23e532cc0 in reftable_decode_key reftable/record.c:194 > #3 0x55f23e54e72e in block_iter_next reftable/block.c:398 > #4 0x55f23e5573dc in table_iter_next_in_block reftable/reader.c:240 > #5 0x55f23e5573dc in table_iter_next reftable/reader.c:355 > #6 0x55f23e5573dc in table_iter_next reftable/reader.c:339 > #7 0x55f23e551283 in merged_iter_advance_subiter reftable/merged.c:69 > #8 0x55f23e55169e in merged_iter_next_entry reftable/merged.c:123 > #9 0x55f23e55169e in merged_iter_next_void reftable/merged.c:172 > #10 0x55f23e537625 in reftable_iterator_next_ref reftable/generic.c:175 > #11 0x55f23e2cf9c6 in reftable_ref_iterator_advance refs/reftable-backend.c:464 > #12 0x55f23e2d996e in ref_iterator_advance refs/iterator.c:13 > #13 0x55f23e2d996e in do_for_each_ref_iterator refs/iterator.c:452 > #14 0x55f23dca6767 in get_ref_map builtin/fetch.c:623 > #15 0x55f23dca6767 in do_fetch builtin/fetch.c:1659 > #16 0x55f23dca6767 in fetch_one builtin/fetch.c:2133 > #17 0x55f23dca6767 in cmd_fetch builtin/fetch.c:2432 > #18 0x55f23dba7764 in run_builtin git.c:484 > #19 0x55f23dba7764 in handle_builtin git.c:741 > #20 0x55f23dbab61e in run_argv git.c:805 > #21 0x55f23dbab61e in cmd_main git.c:1000 > #22 0x55f23dba4781 in main common-main.c:64 > #23 0x7fa0f063fc89 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 > #24 0x7fa0f063fd44 in __libc_start_main_impl ../csu/libc-start.c:360 > #25 0x55f23dba6ad0 in _start (git+0xadfad0) (BuildId: 803b2b7f59beb03d7849fb8294a8e2145dd4aa27) > The stacktrace is for iterating over refs, what I don't understand is where in this flow do we actually reload the stack. > While it is somewhat awkward that the maintenance processes survive > tests in the first place, it is totally expected that reftables should > work alright with concurrent writers. Seemingly they don't. > > The only underlying resource that we need to care about in this context > is the reftable reader, which is responsible for reading a single table > from disk. These readers get discarded immediately (unless reused) when > calling `reftable_stack_reload()`, which is wrong. We can only close > them once we know that there are no iterators using them anymore. > > Prepare for a fix by converting the reftable readers to be refcounted. > Okay so my understanding is that `refcounted` refers to a reference count which keeps tracks of the stacks which are referring to the reader. The name is also used in `struct blame_origin` in blame.{c,h}. Makes a lot more sense now :) > Reported-by: Jeff King <peff@xxxxxxxx> > Signed-off-by: Patrick Steinhardt <ps@xxxxxx> > --- > reftable/reader.c | 16 ++++++++++++++-- > reftable/reader.h | 2 ++ > reftable/readwrite_test.c | 18 +++++++++--------- > reftable/reftable-reader.h | 15 ++++++++++++--- > reftable/stack.c | 8 ++++---- > reftable/stack_test.c | 6 ++---- > t/helper/test-reftable.c | 2 +- > t/unit-tests/t-reftable-merged.c | 4 ++-- > 8 files changed, 46 insertions(+), 25 deletions(-) > > diff --git a/reftable/reader.c b/reftable/reader.c > index 037417fcf6..64a0953e68 100644 > --- a/reftable/reader.c > +++ b/reftable/reader.c > @@ -621,6 +621,7 @@ int reftable_reader_new(struct reftable_reader **out, > r->source = *source; > r->name = xstrdup(name); > r->hash_id = 0; > + r->refcount = 1; > So when the reader is initialized by someone, this sets the counter to one. > err = block_source_read_block(source, &footer, r->size, > footer_size(r->version)); > @@ -645,10 +646,21 @@ int reftable_reader_new(struct reftable_reader **out, > return err; > } > > -void reftable_reader_free(struct reftable_reader *r) > +void reftable_reader_incref(struct reftable_reader *r) > +{ > + if (!r->refcount) > + BUG("cannot increment ref counter of dead reader"); > + r->refcount++; > +} > + > +void reftable_reader_decref(struct reftable_reader *r) > { > if (!r) > return; > + if (!r->refcount) > + BUG("cannot decrement ref counter of dead reader"); > + if (--r->refcount) > + return; > block_source_close(&r->source); > FREE_AND_NULL(r->name); > reftable_free(r); > @@ -812,7 +824,7 @@ int reftable_reader_print_blocks(const char *tablename) > } > > done: > - reftable_reader_free(r); > + reftable_reader_decref(r); > table_iter_close(&ti); > return err; > } > We remove the `_free()` function and instead introduce `_incref()` and `_decref()` to increase and decrease the counter. The latter takes over the functionality of `_free()` when counter hits 0. > diff --git a/reftable/reader.h b/reftable/reader.h > index 88b4f3b421..3710ee09b4 100644 > --- a/reftable/reader.h > +++ b/reftable/reader.h > @@ -50,6 +50,8 @@ struct reftable_reader { > struct reftable_reader_offsets ref_offsets; > struct reftable_reader_offsets obj_offsets; > struct reftable_reader_offsets log_offsets; > + > + uint64_t refcount; Wonder if there is a chance that we decrement refcount from 0 and hence cause a wraparound. [snip] I have some questions regarding my own understanding, but the patch looks good to me.
Attachment:
signature.asc
Description: PGP signature