Re: [PATCH] net/skbuff: silence warnings under memory pressure

Qian Cai <cai@xxxxxx> · Tue, 19 Nov 2019 10:58:27 -0500

On Tue, 2019-11-19 at 10:41 +0100, Petr Mladek wrote:
> On Tue 2019-11-19 09:41:19, Sergey Senozhatsky wrote:
> > On (19/11/18 16:27), Petr Mladek wrote:
> > > > > @@ -2027,8 +2027,11 @@ asmlinkage int vprintk_emit(int facility, int level,
> > > > >  	pending_output = (curr_log_seq != log_next_seq);
> > > > >  	logbuf_unlock_irqrestore(flags);
> > > > >  
> > > > > +	if (!pending_output)
> > > > > +		return printed_len;
> > > > > +
> > > > >  	/* If called from the scheduler, we can not call up(). */
> > > > > -	if (!in_sched && pending_output) {
> > > > > +	if (!in_sched) {
> > > > >  		/*
> > > > >  		 * Disable preemption to avoid being preempted while holding
> > > > >  		 * console_sem which would prevent anyone from printing to
> > > > > @@ -2043,10 +2046,11 @@ asmlinkage int vprintk_emit(int facility, int level,
> > > > >  		if (console_trylock_spinning())
> > > > >  			console_unlock();
> > > > >  		preempt_enable();
> > > > > -	}
> > > > >  
> > > > > -	if (pending_output)
> > > > > +		wake_up_interruptible(&log_wait);
> > > 
> > > I do not like this. As a result, normal printk() will always deadlock
> > > in the scheduler code, including WARN() calls. The chance of the
> > > deadlock is small now. It happens only when there is another
> > > process waiting for console_sem.
> > 
> > Why would it *always* deadlock? If this is the case, why we don't *always*
> > deadlock doing the very same wake_up_process() from console_unlock()?
> 
> I speak about _normal_ printk() and not about printk_deferred().
> 
> wake_up_process() is called in console_unlock() only when
> sem->wait_list is not empty, see up() in kernel/locking/semaphore.c.
> printk() itself uses console_trylock() and does not wait.
> 
> I believe that this is the rason why printk_sched() was added
> so late in 2012. It was more than 10 years after adding
> the semaphore into console_unlock(). IMHO, the deadlock
> was rare. Of course, it was also hard to debug but it
> would not take 10 years.

I would not be surprise that those potential deadlocks have been existed even
for 10 years. Not only that it is difficult to debug, but also when eventually
someone had reported them, subsystem developers could still "kick balls" like
where it had been observed for the last a few months, and no progress could be
done for those as eventually life is too short and the reporters have to give
up.