Re: [PATCH printk v2 10/11] rcu: Add atomic write enforcement for rcu stalls

Petr Mladek <pmladek@xxxxxxxx> · Wed, 27 Sep 2023 17:00:24 +0200

On Wed 2023-09-20 01:14:55, John Ogness wrote:
> Invoke the atomic write enforcement functions for rcu stalls to
> ensure that the information gets out to the consoles.
> 
> It is important to note that if there are any legacy consoles
> registered, they will be attempting to directly print from the
> printk-caller context, which may jeopardize the reliability of
> the atomic consoles. Optimally there should be no legacy
> consoles registered.
> 
> Signed-off-by: John Ogness <john.ogness@xxxxxxxxxxxxx>
> ---
>  kernel/rcu/tree_stall.h | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index 6f06dc12904a..0a58f8b233d8 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -8,6 +8,7 @@
>   */
>  
>  #include <linux/kvm_para.h>
> +#include <linux/console.h>
>  
>  //////////////////////////////////////////////////////////////////////////////
>  //
> @@ -582,6 +583,7 @@ static void rcu_check_gp_kthread_expired_fqs_timer(void)
>  
>  static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
>  {
> +	enum nbcon_prio prev_prio;
>  	int cpu;
>  	unsigned long flags;
>  	unsigned long gpa;
> @@ -597,6 +599,8 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
>  	if (rcu_stall_is_suppressed())
>  		return;
>  
> +	prev_prio = nbcon_atomic_enter(NBCON_PRIO_EMERGENCY);
> +
>  	/*
>  	 * OK, time to rat on our buddy...
>  	 * See Documentation/RCU/stallwarn.rst for info on how to debug
> @@ -651,6 +655,8 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
>  	panic_on_rcu_stall();
>  
>  	rcu_force_quiescent_state();  /* Kick them all. */
> +
> +	nbcon_atomic_exit(NBCON_PRIO_EMERGENCY, prev_prio);

The locations looks reasonable to me. I just hope that we would
use another API: nbcon_emergency_enter()/exit() in the end.

Note that the new API it would allow to flush the messages in
the emergency context immediately from printk().

In that case, we would to handle nmi_trigger_cpumask_backtrace()
some special way.

This function would be called from the emergency context but
the nmi_cpu_backtrace() callbacks would be called on other
CPUs in normal context.

For this case I would add something like:

void nbcon_flush_all_emergency(void)
{
	emum nbcon_prio = nbcon_get_default_prio();

	if (nbcon_prio >= NBCON_PRIO_EMERGENCY)
		nbcon_flush_all();
}

, where the POC of nbcon_get_default_prio() and nbcon_flush_all()
was in the replay to the 7th patch, see
https://lore.kernel.org/all/ZRLBxsXPCym2NC5Q@alley/

Best Regards,
Petr