Re: [PATCH v3] printk: Add console owner and waiter logic to load balance console writes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



(Ccing Tejun)

On (11/07/17 10:40), Sergey Senozhatsky wrote:
> On (11/06/17 21:06), Tetsuo Handa wrote:
> > I tried your patch with warn_alloc() torture. It did not cause lockups.
> > But I felt that possibility of failing to flush last second messages (such
> > as SysRq-c or SysRq-b) to consoles has increased. Is this psychological?
> 
> do I understand it correctly that there are "lost messages"?
> 
> sysrq-b does an immediate emergency reboot. "normally" it's not expected
> to flush any pending logbuf messages because it's an emergency-reboot...
> but in fact it does. and this is why sysrq-b is not 100% reliable:
> 
> 	__handle_sysrq()
> 	{
> 	  pr_info("SysRq : ");
> 
> 	  op_p = __sysrq_get_key_op(key);
> 	  pr_cont("%s\n", op_p->action_msg);
> 
> 	    op_p->handler(key);
> 
> 	  pr_cont("\n");
> 	}
> 
> those pr_info()/pr_cont() calls can spoil sysrq-b, depending on how
> badly the system is screwed. if pr_info() deadlocks, then we never
> go to op_p->handler(key)->emergency_restart(). even if you suppress
> printing of info loglevel messages, pr_info() still goes to
> console_unlock() and prints [console_seq, log_next_seq] messages,
> if there any.
> 
> there is, however, a subtle behaviour change, I think.
> 
> previously, in some cases [?], pr_info("SysRq : ") from __handle_sysrq()
> would flush logbuf messages. now we have that "break out of console_unlock()
> loop even though there are pending messages, there is another CPU doing
> printk()". so sysrb-b instead of looping in console_unlock() goes directly
> to emergency_restart(). without the change it would have continued looping
> in console_unlock() and would have called emergency_restart() only when
> "console_seq == log_next_seq".
> 
> now... the "subtle" part here is that we had that thing:
> 	- *IF* __handle_sysrq() grabs the console_sem then it will not
> 	  return from console_unlock() until logbuf is empty. so
> 	  concurrent printk() messages won't get lost.
> 
> what we have now is:
> 	- if there are concurrent printk() then __handle_sysrq() does not
> 	  fully flush the logbuf *even* if it grabbed the console_sem.

the change goes further. I did express some of my concerns during the KS,
I'll just bring them to the list.


we now always shift printing from a save - scheduleable - context to
a potentially unsafe one - atomic. by example:

CPU0			CPU1~CPU10	CPU11

console_lock()

			printk();

console_unlock()			IRQ
 set console_owner			printk()
					 sees console_owner
					 set console_waiter
 sees console_waiter
 break
					 console_unlock()
					 ^^^^ lockup [?]


so we are forcibly moving console_unlock() from safe CPU0 to unsafe CPU11.
previously we would continue printing from a schedulable context.


another case. bare with me.

suppose that call_console_drivers() is slower than printk() -> log_store(),
which is often the case.

now assume the following:

CPU0				CPU1

IRQ				IRQ

printk()			printk()
printk()			printk()
printk()			printk()


which probably could have been handled something like this:

CPU0				CPU1

IRQ				IRQ

printk()			printk()
 log_store()
				 log_store()
 console_unlock()
  call_console_drivers()
				printk()
				 log_store()
 goto again;
  call_console_drivers()
				printk()
				 log_store()
 goto again;
  call_console_drivers()
printk()
 log_store()
  console_unlock()
   call_console_drivers()
printk()
 log_store()
  console_unlock()
   call_console_drivers()


so CPU0 printed all the messages.
CPU1 simply did 3 * log_store()
	// + spent some cycles on logbuf_lock spin_lock
	// + console_sem trylock


but now every CPU will do call_console_drivers() + busy loop.


CPU0				CPU1

IRQ				IRQ

printk()			printk()
 log_store()
				 log_store()
 console_unlock()
  set console_owner
				 sees console_owner
				 sets console_waiter
				 spin
  call_console_drivers()
  sees console_waiter
   break

printk()
 log_store()
				 console_unlock()
				  set console_owner
 sees console_owner
 sets console_waiter
 spin
				 call_console_drivers()
				 sees console_waiter
				  break

				printk()
				 log_store()
 console_unlock()
  set console_owner
				 sees console_owner
				 sets console_waiter
				 spin
  call_console_drivers()
  sees console_waiter
  break

printk()
 log_store()
				 console_unlock()
				  set console_owner
 sees console_owner
 sets console_waiter
 spin

				.... and so on

which not only brings the cost of call_console_drivers() from
CPU's own printk(), but it also brings the cost [busy spin] of
call_console_drivers() happening on _another_ CPU. am I wrong?

	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux