Re: [PATCH v2 9/9] reftable/stack: handle locked tables during auto-compaction

Karthik Nayak <karthik.188@xxxxxxxxx> · Thu, 8 Aug 2024 05:25:27 -0700

Patrick Steinhardt <ps@xxxxxx> writes:

> When compacting tables, it may happen that we want to compact a set of
> tables which are already locked by a concurrent process that compacts
> them. In the case where we wanted to perform a full compaction of all
> tables it is sensible to bail out in this case, as we cannot fulfill the
> requested action.
>
> But when performing auto-compaction it isn't necessarily in our best
> interest of us to abort the whole operation. For example, due to the
> geometric compacting schema that we use, it may be that process A takes
> a lot of time to compact the bulk of all tables whereas process B
> appends a bunch of new tables to the stack. B would in this case also
> notice that it has to compact the tables that process A is compacting
> already and thus also try to compact the same range, probably including
> the new tables it has appended. But because those tables are locked
> already, it will fail and thus abort the complete auto-compaction. The
> consequence is that the stack will grow longer and longer while A isn't
> yet done with compaction, which will lead to a growing performance
> impact.
>
> Instead of aborting auto-compaction altogether, let's gracefully handle
> this situation by instead compacting tables which aren't locked. To do
> so, instead of locking from the beginning of the slice-to-be-compacted,
> we start locking tables from the end of the slice. Once we hit the first
> table that is locked already, we abort. If we succeded to lock two or

s/succeded/succeeded

> more tables, then we simply reduce the slice of tables that we're about
> to compact to those which we managed to lock.
>
> This ensures that we can at least make some progress for compaction in
> said scenario. It also helps in other scenarios, like for example when a
> process died and left a stale lockfile behind. In such a case we can at
> least ensure some compaction on a best-effort basis.
>

Right. This is really well written.

[snip]

> @@ -1052,19 +1062,47 @@ static int stack_compact_range(struct reftable_stack *st,
>  	/*
>  	 * Lock all tables in the user-provided range. This is the slice of our
>  	 * stack which we'll compact.
> +	 *
> +	 * Note that we lock tables in reverse order from last to first. The
> +	 * intent behind this is to allow a newer process to perform best
> +	 * effort compaction of tables that it has added in the case where an
> +	 * older process is still busy compacting tables which are preexisting
> +	 * from the point of view of the newer process.
>  	 */
>  	REFTABLE_CALLOC_ARRAY(table_locks, last - first + 1);
> -	for (i = first; i <= last; i++) {
> -		stack_filename(&table_name, st, reader_name(st->readers[i]));
> +	for (i = last + 1; i > first; i--) {
> +		stack_filename(&table_name, st, reader_name(st->readers[i - 1]));
>
>  		err = hold_lock_file_for_update(&table_locks[nlocks],
>  						table_name.buf, LOCK_NO_DEREF);
>  		if (err < 0) {
> -			if (errno == EEXIST)
> +			/*
> +			 * When the table is locked already we may do a
> +			 * best-effort compaction and compact only the tables
> +			 * that we have managed to lock so far. This of course
> +			 * requires that we have been able to lock at least two
> +			 * tables, otherwise there would be nothing to compact.
> +			 * In that case, we return a lock error to our caller.
> +			 */
> +			if (errno == EEXIST && last - (i - 1) >= 2 &&
> +			    flags & STACK_COMPACT_RANGE_BEST_EFFORT) {
> +				err = 0;
> +				/*
> +				 * The subtraction is to offset the index, the
> +				 * addition is to only compact up to the table
> +				 * of the preceding iteration. They obviously
> +				 * cancel each other out, but that may be
> +				 * non-obvious when it was omitted.
> +				 */
> +				first = (i - 1) + 1;

I remember thinking about this and how to fix it, I'm happy how you've
done it. It's simple and sufficient. Kudos.

[snip]

Overall this patch looked great. I have nothing to add, I really like it
that we use a flag and this means this would only come into play for
auto-compaction.
Attachment:
signature.asc

Description: PGP signature