Patrick Steinhardt <ps@xxxxxx> writes: > When compacting tables, it may happen that we want to compact a set of > tables which are already locked by a concurrent process that compacts > them. In the case where we wanted to perform a full compaction of all > tables it is sensible to bail out in this case, as we cannot fulfill the > requested action. > > But when performing auto-compaction it isn't necessarily in our best > interest of us to abort the whole operation. For example, due to the > geometric compacting schema that we use, it may be that process A takes > a lot of time to compact the bulk of all tables whereas process B > appends a bunch of new tables to the stack. B would in this case also > notice that it has to compact the tables that process A is compacting > already and thus also try to compact the same range, probably including > the new tables it has appended. But because those tables are locked > already, it will fail and thus abort the complete auto-compaction. The > consequence is that the stack will grow longer and longer while A isn't > yet done with compaction, which will lead to a growing performance > impact. > > Instead of aborting auto-compaction altogether, let's gracefully handle > this situation by instead compacting tables which aren't locked. To do > so, instead of locking from the beginning of the slice-to-be-compacted, > we start locking tables from the end of the slice. Once we hit the first > table that is locked already, we abort. If we succeded to lock two or s/succeded/succeeded > more tables, then we simply reduce the slice of tables that we're about > to compact to those which we managed to lock. > > This ensures that we can at least make some progress for compaction in > said scenario. It also helps in other scenarios, like for example when a > process died and left a stale lockfile behind. In such a case we can at > least ensure some compaction on a best-effort basis. > Right. This is really well written. [snip] > @@ -1052,19 +1062,47 @@ static int stack_compact_range(struct reftable_stack *st, > /* > * Lock all tables in the user-provided range. This is the slice of our > * stack which we'll compact. > + * > + * Note that we lock tables in reverse order from last to first. The > + * intent behind this is to allow a newer process to perform best > + * effort compaction of tables that it has added in the case where an > + * older process is still busy compacting tables which are preexisting > + * from the point of view of the newer process. > */ > REFTABLE_CALLOC_ARRAY(table_locks, last - first + 1); > - for (i = first; i <= last; i++) { > - stack_filename(&table_name, st, reader_name(st->readers[i])); > + for (i = last + 1; i > first; i--) { > + stack_filename(&table_name, st, reader_name(st->readers[i - 1])); > > err = hold_lock_file_for_update(&table_locks[nlocks], > table_name.buf, LOCK_NO_DEREF); > if (err < 0) { > - if (errno == EEXIST) > + /* > + * When the table is locked already we may do a > + * best-effort compaction and compact only the tables > + * that we have managed to lock so far. This of course > + * requires that we have been able to lock at least two > + * tables, otherwise there would be nothing to compact. > + * In that case, we return a lock error to our caller. > + */ > + if (errno == EEXIST && last - (i - 1) >= 2 && > + flags & STACK_COMPACT_RANGE_BEST_EFFORT) { > + err = 0; > + /* > + * The subtraction is to offset the index, the > + * addition is to only compact up to the table > + * of the preceding iteration. They obviously > + * cancel each other out, but that may be > + * non-obvious when it was omitted. > + */ > + first = (i - 1) + 1; I remember thinking about this and how to fix it, I'm happy how you've done it. It's simple and sufficient. Kudos. [snip] Overall this patch looked great. I have nothing to add, I really like it that we use a flag and this means this would only come into play for auto-compaction.
Attachment:
signature.asc
Description: PGP signature