Re: [PATCH 7/8] reftable/stack: fix corruption on concurrent compaction

Patrick Steinhardt <ps@xxxxxx> · Thu, 1 Aug 2024 10:41:04 +0200

On Wed, Jul 31, 2024 at 08:04:17PM -0500, Justin Tobler wrote:
> On 24/07/31 04:15PM, Patrick Steinhardt wrote:
> > But there is a bug in the code. Suppose we have two processes which are
> > compacting two slices of the table. Given that we lock each of the
> > tables before compacting them, we know that the slices must be disjunct
> > from each other. But regardless of that, compaction performed by one
> > process will always impact what the other process needs to write to the
> > "tables.list" file.
> 
> I'm not quite sure I understand at this point how it is possible for two
> compaction operations to be performed concurrently. Wouldn't there
> always be overlap between the two compaction segments thus causing one
> of the operations to be unable to acquire all of the required locks and
> abort?

In practice we cannot assume anything about how another process compacts
tables. While we can assume something about how a particular version of
Git compacts tables, we cannot assume anything about future versions of
Git or about alternate implementations of Git. The reftable backend
allows for compacting only a subset of tables, and the heuristic is not
mandated by the on-disk format except that the tables that we are about
to compact need to be next to each other in the stack.

Furthermore, with the next patch, we also handle it gracefully when some
parts of the stack are locked already. Thus, it can easily happen that
process A compacts tables 1 to 3, whereas process B will try to compact
tables 1 to 5, fail to acquire the lock for table 3, and then reduce the
range to compact to 3 to 5.

> > changed after we have locked it for the second time in (5). This has the
> > consequence that we will always commit the old, cached in-core tables to
> > disk without paying to respect what the other process has written. This
> > scenario would then lead to data loss and corruption.
> 
> If a concurrent compaction happens though, it would mess up the indices
> and cause problems when writting the "tables.list" file. That would not
> be good.

Yup.

> > This can even happen in the simpler case of one compacting process and
> > one writing process. The newly-appended table by the writing process
> > would get discarded by the compacting process because it never sees the
> > new table.
> 
> This is indeed a problem. Since we don't reload the stack, we are
> unaware of any concurrently append tables causing them to not be
> written in the new "tables.list" file. Scary

Indeed.

> > +		/*
> > +		 * We have found the new range that we want to replace, so
> > +		 * let's update the range of tables that we want to replace.
> > +		 */
> > +		last_to_replace = last + (new_offset - first);
> > +		first_to_replace = new_offset;
> > +	} else {
> > +		REFTABLE_CALLOC_ARRAY(names, st->merged->stack_len + 1);
> 
> I was confused at first by the `stack_len` + 1. The extra element is
> NULL which tells us there are no more tables to add to the list,
> correct? It looks like `fd_read_lines()` also adds an extra element.

Yes, that's the reason why we have it. We end up passing `names` to
`free_names()`, which uses `NULL` as a sentinel value to know when to
stop iterating over the array's entries.

I'll add a comment.

Thanks for your review. I'll wait a bit longer before sending out
another version of this patch series to wait for some more feedback.

Patrick
Attachment:
signature.asc

Description: PGP signature