On (03/07/19 11:37), John Ogness wrote: > > Sorry John, the reasoning is that I'm trying to understand > > why this does not look like soft or hard lock-up or RCU stall > > scenario. > > The reason is that you are seeing data being printed on the console. The > watchdogs (soft, hard, rcu, nmi) are all touched with each emergency > message. Correct. Please see below. > > The CPU which spins on prb_lock() can have preemption disabled and, > > additionally, can have local IRQs disabled, or be under RCU read > > side lock. If consoles are busy, then there are CPUs which printk() > > data and keep prb_lock contended; prb_lock() does not seem to be > > fair. What am I missing? > > You are correct. Making prb_lock fair might be something we want to look > into. Perhaps also based on the loglevel of what needs to be > printed. (For example, KERN_ALERT always wins over KERN_CRIT.) Good. I'm not insisting, but I have a feeling that touching watchdogs after call_console_drivers() might be too late, sometimes. When we spin in prb_lock() we wait for all CPUs which are before/ahead of us to finish their call_console_drivers(), one by one. So if CPUZ is very unlucky and is in atomic context, then prb_lock() for that CPUZ can last for N * call_console_drivers(). And depending on N (which also includes unfairness) and call_console_drivers() timings NMI watchdog may pay CPUZ a visit before it gets its chance to touch watchdogs. *May be* sometimes we might want to touch watchdogs in prb_lock(). So, given the design of new printk(), I can't help thinking about the fact that current "the winner takes it all" may become "the winner waits for all". Mikulas mentioned that he observes "** X messages dropped" warnings. And this suggests that, _most likely_, we had significantly more that 2 CPUs calling printk() concurrently. - A single source - single CPU calling printk() - would not lose messages, because it would print its own message before it printk() another one (we still could have another CPU rescheduling under console_sem, but I don't think this is the case). - Two CPUs would also probably not lose messages, Steven's console_owner would throttle them down. So I think what we have was a spike of WARN/ERR printk-s comming from N CPUs concurrently. And this brings us to another pessimistic scenario: a very unlucky CPUZ has to spin in prb_lock() waiting for other CPUs to print out the very same 2 million chars. Which in terms of printk() latency looks to me just like current printk. John, sorry to ask this, does new printk() design always provide latency guarantees good enough for PREEMPT_RT? I'm surely missing something. Where am I wrong? -ss -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel