Re: [PATCH v2 8/9] reftable/stack: fix corruption on concurrent compaction

Karthik Nayak <karthik.188@xxxxxxxxx> · Thu, 8 Aug 2024 07:14:15 -0500

Patrick Steinhardt <ps@xxxxxx> writes:

> The locking employed by compaction uses the following schema:
>
>   1. Lock "tables.list" and verify that it matches the version we have
>      loaded in core.
>
>   2. Lock each of the tables in the user-supplied range of tables that
>      we are supposed to compact. These locks prohibit any concurrent
>      process to compact those tables while we are doing that.
>
>   3. Unlock "tables.list". This enables concurrent processes to add new
>      tables to the stack, but also allows them to compact tables outside
>      of the range of tables that we have locked.
>
>   4. Perform the compaction.
>
>   5. Lock "tables.list" again.
>
>   6. Move the compacted table into place.
>
>   7. Write the new order of tables, including the compacted table, into
>      the lockfile.
>
>   8. Commit the lockfile into place.
>

This summary helps a lot, thanks!

[snip]

> @@ -1123,6 +1125,100 @@ static int stack_compact_range(struct reftable_stack *st,
>  		}
>  	}
>
> +	/*
> +	 * As we have unlocked the stack while compacting our slice of tables
> +	 * it may have happened that a concurrently running process has updated
> +	 * the stack while we were compacting. In that case, we need to check
> +	 * whether the tables that we have just compacted still exist in the
> +	 * stack in the exact same order as we have compacted them.
> +	 *

But as per the current implementation, the tables we compacted would
always exist in tables.list, since we've obtained a lock on them.

Looking at the code below, wouldn't it be more ideal to talk about how
there are two scenarios we need to handle?
1. Stack is upto date, there we simply overwrite the stack with our
modified version.
2. Stack is not upto date, in this scenario, we need to amend the stack
without loosing out information. An extra check here is that we also see
that the tables we compact are still existing. (I don't really get, why
they wouldn't be though).

> +	 * If they do exist, then it is fine to continue and replace those
> +	 * tables with our compacted version. If they don't, then we need to
> +	 * abort.
> +	 */
> +	err = stack_uptodate(st);
> +	if (err < 0)
> +		goto done;
> +	if (err > 0) {

So this is the scenario where the stack is no longer upto date.

> +		ssize_t new_offset = -1;
> +		int fd;
> +
> +		fd = open(st->list_file, O_RDONLY);
> +		if (fd < 0) {
> +			err = REFTABLE_IO_ERROR;
> +			goto done;
> +		}
> +
> +		err = fd_read_lines(fd, &names);
> +		close(fd);
> +		if (err < 0)
> +			goto done;
> +
> +		/*
> +		 * Search for the offset of the first table that we have
> +		 * compacted in the updated "tables.list" file.
> +		 */
> +		for (size_t i = 0; names[i]; i++) {
> +			if (strcmp(names[i], st->readers[first]->name))
> +				continue;
> +
> +			/*
> +			 * We have found the first entry. Verify that all the
> +			 * subsequent tables we have compacted still exist in
> +			 * the modified stack in the exact same order as we
> +			 * have compacted them.
> +			 */
> +			for (size_t j = 1; j < last - first + 1; j++) {
> +				const char *old = first + j < st->merged->stack_len ?
> +					st->readers[first + j]->name : NULL;
> +				const char *new = names[i + j];
> +
> +				/*
> +				 * If some entries are missing or in case the tables
> +				 * have changed then we need to bail out. Again, this
> +				 * shouldn't ever happen because we have locked the
> +				 * tables we are compacting.
> +				 */

Okay, this is exactly what I was saying above. It still does makes sense
to keep this check to ensure future versions don't break it.

> +				if (!old || !new || strcmp(old, new)) {
> +					err = REFTABLE_OUTDATED_ERROR;
> +					goto done;
> +				}
> +			}
> +
> +			new_offset = i;
> +			break;
> +		}
> +
> +		/*
> +		 * In case we didn't find our compacted tables in the stack we
> +		 * need to bail out. In theory, this should have never happened
> +		 * because we locked the tables we are compacting.
> +		 */
> +		if (new_offset < 0) {
> +			err = REFTABLE_OUTDATED_ERROR;
> +			goto done;
> +		}
> +
> +		/*
> +		 * We have found the new range that we want to replace, so
> +		 * let's update the range of tables that we want to replace.
> +		 */
> +		last_to_replace = last + (new_offset - first);
> +		first_to_replace = new_offset;

Nit: might be easier to read as

  first_to_replace = new_offset;
  last_to_replace = first_to_replace + (last - first);

[snip]
Attachment:
signature.asc

Description: PGP signature