Re: [PATCH 3/4] reftable/stack: register lockfiles during compaction

Justin Tobler <jltobler@xxxxxxxxx> · Wed, 6 Mar 2024 13:57:23 -0600

On 24/03/06 12:59PM, Patrick Steinhardt wrote:

> > Something not really related to this patch, but I noticed and had a
> > question about.
> > 
> > If I'm understanding this correctly, when a newly compacted table is
> > empty, it becomes possible for a range of indexes to no longer exist 
> > within the stack. If this occurs in the middle of the stack, future
> > compaction will likely combine the tables on either side and restore the
> > missing index range. If the empty table was at the end of the stack,
> > would this effectly reset the max index to something lower for future
> > tables written to the stack? If so, could this lead to issues with
> > separate concurrent table writes?
> 
> Very good question indeed, but I think we should be fine here. This is
> mostly because concurrent writers will notice when "tables.list" has
> changed, and, if so, abort the transaction with an out-of-date error.
> 
> A few scenarios with concurrent processes, one process which compacts
> the stack (C) and one which modifies it (M):
> 
>   - M acquires the lock before C compacts: M sees the whole stack and
>     uses the latest update index to update it, resulting in a newly
>     written table. When C then locks afterwards, it may decide to
>     compact and drop some tables in the middle of the stack. This may
>     lead to a gap in update indices, but this is fine.
> 
>   - M acquires the lock while C compacts: M sees the whole stack and
>     uses the latest update index to update the stack. C then acquires
>     the lock to write the merged tables, notices that its compacted
>     tables still exist and are in the same order, and thus removes them.
>     We now have a gap in update indices, but this is totally fine.
> 
>   - M acquires the lock after C compacts: M will refresh "tables.list"
>     after it has acquired the lock itself. Thus, it won't ever see the
>     now-dropped empty table.
> 
> M cannot write its table when C has the "tables.list" lock, so this
> scenario cannot happen. In the same spirit, two Ms cannot race with each
> other either as only one can have the "tables.list" lock, and the other
> one would abort with an out-of-date error when it has subsequently
> acquired the lock and found the "tables.list" contents to have been
> updated concurrently.

Thanks Patrick for the great explanation! Digging into this a bit
further, I see that we return `REFTABLE_LOCK_ERROR` when the list file
lock already exists or has changed when attempting to add a new table to
the list. 

When performing compaction in `stack_compact_range()`, after initially
acquiring the table list lock, we also check if the stack is up-to-date
with `stack_uptodate()`. I noticed that this check is not performed
again after the table list is locked for the second time. At first I
thought this could be problematic, but I realized that this would only
be an issue for concurrent compactions and because the tables are locked
it should not matter.

-Justin