Re: [PATCH v1 01/11] seqlock: provide lockdep-free raw_seqcount_t variant

David Hildenbrand <david@xxxxxxxxxx> · Fri, 17 Dec 2021 23:02:52 +0100

On 17.12.21 22:28, Thomas Gleixner wrote:
> On Fri, Dec 17 2021 at 12:30, David Hildenbrand wrote:
>> Sometimes it is required to have a seqcount implementation that uses
>> a structure with a fixed and minimal size -- just a bare unsigned int --
>> independent of the kernel configuration. This is especially valuable, when
>> the raw_ variants of the seqlock function will be used and the additional
>> lockdep part of the seqcount_t structure remains essentially unused.
>>
>> Let's provide a lockdep-free raw_seqcount_t variant that can be used via
>> the raw functions to have a basic seqlock.
>>
>> The target use case is embedding a raw_seqcount_t in the "struct page",
>> where we really want a minimal size and cannot tolerate a sudden grow of
>> the seqcount_t structure resulting in a significant "struct page"
>> increase or even a layout change.
> 

Hi Thomas,

thanks for your feedback!

> Cannot tolerate? Could you please provide a reason and not just a
> statement?

Absolutely.

"struct page" is supposed to have a minimal size with a fixed layout.
Embedding something inside such a structure can change the fixed layout
in a way that it can just completely breaks any assumptions on location
of values.

Therefore, embedding a complex structure in it is usually avoided -- and
if we have to (spin_lock), we work around sudden size increases.

There are ways around it: allocate the lock and only store the pointer
in the struct page. But that most certainly adds complexity, which is
why I want to avoid it for now.

I'll extend that answer and add it to the patch description.

> 
>> Provide raw_read_seqcount_retry(), to make it easy to match to
>> raw_read_seqcount_begin() in the code.
>>
>> Let's add a short documentation as well.
>>
>> Note: There might be other possible users for raw_seqcount_t where the
>>       lockdep part might be completely unused and just wastes memory --
>>       essentially any users that only use the raw_ function variants.
> 
> Even when the reader side uses raw_seqcount_begin/retry() the writer
> side still can use the non-raw variant which validates that the
> associated lock is held on write.

Yes, that's my understanding as well.

> 
> Aside of that your proposed extra raw sequence count needs extra care
> vs. PREEMPT_RT and this want's to be very clearly documented. Why?
> 
> The lock association has two purposes:
> 
>     1) Lockdep coverage which unearthed bugs already

Yes, that's a real shame to lose.

> 
>     2) PREEMPT_RT livelock prevention
> 
>        Assume the following:
> 
>        spin_lock(wrlock);
>        write_seqcount_begin(seq);
> 
>        ---> preemption by a high priority reader
> 
>        seqcount_begin(seq); <-- live lock
> 
>        The RT substitution does:
> 
>        seqcount_begin(seq)
>            cnt = READ_ONCE(seq->sequence);
> 
>            if (cnt & 1) {
>            	lock(s->lock);
>                 unlock(s->lock);
>            }
> 
>        which prevents the deadlock because it makes the reader block on
>        the associated lock, which allows the preempted writer to make
>        progress.
> 
>        This applies to raw_seqcount_begin() as well.
> 
> I have no objections against the construct itself, but this has to be
> properly documented vs. the restriction this imposes.

Absolutely, any input is highly appreciated.

But to mention it again: whatever you can do with raw_seqcount_t, you
can do with seqcount_t, and there are already users relying completely
on the raw_ function variants (see my other reply).

So the documentation should most probably be extended to cover the raw_
functions and seqcount_t in general.

> 
> As you can see above the writer side therefore has to ensure that it
> cannot preempted on PREEMPT_RT, which limits the possibilities what you
> can do inside a preemption (or interrupt) disabled section on RT enabled
> kernels. See Documentation/locking/locktypes.rst for further information.

It's going to be used for THP, which are currently incompatible with
PREEMPT_RT (disabled in the Kconfig). But preemption is also disabled
because we're using bit_spin_lock(), which does a bit_spin_lock().

Certainly worth documenting!

Thanks for your input!

-- 
Thanks,

David / dhildenb