Re: [RFC PATCH v1 02/25] printk-rb: add prb locking functions

John Ogness <john.ogness@xxxxxxxxxxxxx> · Wed, 13 Feb 2019 22:39:20 +0100

On 2019-02-13, Petr Mladek <pmladek@xxxxxxxx> wrote:
>> Add processor-reentrant spin locking functions. These allow
>> restricting the number of possible contexts to 2, which can simplify
>> implementing code that also supports NMI interruptions.
>> 
>>     prb_lock();
>> 
>>     /*
>>      * This code is synchronized with all contexts
>>      * except an NMI on the same processor.
>>      */
>> 
>>     prb_unlock();
>> 
>> In order to support printk's emergency messages, a
>> processor-reentrant spin lock will be used to control raw access to
>> the emergency console. However, it must be the same
>> processor-reentrant spin lock as the one used by the ring buffer,
>> otherwise a deadlock can occur:
>> 
>>     CPU1: printk lock -> emergency -> serial lock
>>     CPU2: serial lock -> printk lock
>> 
>> By making the processor-reentrant implemtation available externally,
>> printk can use the same atomic_t for the ring buffer as for the
>> emergency console and thus avoid the above deadlock.
>
> Interesting idea. I just wonder if it might cause some problems
> when it is shared between printk() and many other consoles.
>
> It sounds like the big kernel lock or console_lock. They
> both caused many troubles.

It causes big troubles (deadlocks) if you have multiple instances of
it. Mainly because printk can be called from _any_ line of code in the
kernel. That is the reason I decided that it needs to be shared and only
used in call chains that are carefully constructed such as:

   printk -> write_atomic

and NMI contexts are _never_ allowed to do things that rely on waiting
forever for other CPUs. For that reason it does kinda feel like a BKL.

If we do find some problems, we may want to switch to a ringbuffer
implementation that is fully lockless for both multi-readers and
multi-writers. I have written such a beast, but it is less efficient and
more complex than the ringbuffer in this series. Also, that only shrinks
the window since write_atomic would still need to make use of the
processor-reentrant spinlock to synchronize the console output. That's
why I decided to RFC with the simpler ringbuffer implementation.

>> diff --git a/lib/printk_ringbuffer.c b/lib/printk_ringbuffer.c
>> new file mode 100644
>> index 000000000000..28958b0cf774
>> --- /dev/null
>> +++ b/lib/printk_ringbuffer.c
>> @@ -0,0 +1,77 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +#include <linux/smp.h>
>> +#include <linux/printk_ringbuffer.h>
>> +
>> +static bool __prb_trylock(struct prb_cpulock *cpu_lock,
>> +			  unsigned int *cpu_store)
>> +{
>> +	unsigned long *flags;
>> +	unsigned int cpu;
>> +
>> +	cpu = get_cpu();
>> +
>> +	*cpu_store = atomic_read(&cpu_lock->owner);
>> +	/* memory barrier to ensure the current lock owner is visible */
>> +	smp_rmb();
>> +	if (*cpu_store == -1) {
>> +		flags = per_cpu_ptr(cpu_lock->irqflags, cpu);
>> +		local_irq_save(*flags);
>> +		if (atomic_try_cmpxchg_acquire(&cpu_lock->owner,
>> +					       cpu_store, cpu)) {
>> +			return true;
>> +		}
>> +		local_irq_restore(*flags);
>> +	} else if (*cpu_store == cpu) {
>> +		return true;
>> +	}
>> +
>> +	put_cpu();
>
> Is there any reason why you get/put CPU and enable/disable
> in each iteration?
>
> It is a spin lock after all. We do busy waiting anyway. This looks like
> burning CPU power for no real gain. Simple cpu_relax() should be
> enough.

Agreed.

>> +	return false;
>> +}
>> +
>> +/*
>> + * prb_lock: Perform a processor-reentrant spin lock.
>> + * @cpu_lock: A pointer to the lock object.
>> + * @cpu_store: A "flags" pointer to store lock status information.
>> + *
>> + * If no processor has the lock, the calling processor takes the lock and
>> + * becomes the owner. If the calling processor is already the owner of the
>> + * lock, this function succeeds immediately. If lock is locked by another
>> + * processor, this function spins until the calling processor becomes the
>> + * owner.
>> + *
>> + * It is safe to call this function from any context and state.
>> + */
>> +void prb_lock(struct prb_cpulock *cpu_lock, unsigned int *cpu_store)
>> +{
>> +	for (;;) {
>> +		if (__prb_trylock(cpu_lock, cpu_store))
>> +			break;
>> +		cpu_relax();
>> +	}
>> +}
>> +
>> +/*
>> + * prb_unlock: Perform a processor-reentrant spin unlock.
>> + * @cpu_lock: A pointer to the lock object.
>> + * @cpu_store: A "flags" object storing lock status information.
>> + *
>> + * Release the lock. The calling processor must be the owner of the lock.
>> + *
>> + * It is safe to call this function from any context and state.
>> + */
>> +void prb_unlock(struct prb_cpulock *cpu_lock, unsigned int cpu_store)
>> +{
>> +	unsigned long *flags;
>> +	unsigned int cpu;
>> +
>> +	cpu = atomic_read(&cpu_lock->owner);
>> +	atomic_set_release(&cpu_lock->owner, cpu_store);
>> +
>> +	if (cpu_store == -1) {
>> +		flags = per_cpu_ptr(cpu_lock->irqflags, cpu);
>> +		local_irq_restore(*flags);
>> +	}
>
> cpu_store looks like an implementation detail. The caller
> needs to remember it to handle the nesting properly.

It's really no different than "flags" in irqsave/irqrestore.

> We could achieve the same with a recursion counter hidden
> in struct prb_lock.

The only way I see how that could be implemented is if the cmpxchg
encoded the cpu owner and counter into a single integer. (Upper half as
counter, lower half as cpu owner.) Both fields would need to be updated
with a single cmpxchg. The critical cmpxchg being the one where the CPU
becomes unlocked (counter goes from 1 to 0 and cpu owner goes from N to
-1).

That seems like a lot of extra code just to avoid passing a "flags"
argument.

John Ogness