On Mon, 2019-10-07 at 11:05 +0200, Petr Mladek wrote: > On Mon 2019-10-07 10:07:42, Michal Hocko wrote: > > On Fri 04-10-19 18:26:45, Qian Cai wrote: > > > It is unsafe to call printk() while zone->lock was held, i.e., > > > > > > zone->lock --> console_lock > > > > > > because the console could always allocate some memory in different code > > > paths and form locking chains in an opposite order, > > > > > > console_lock --> * --> zone->lock > > > > > > As the result, it triggers lockdep splats like below and in different > > > code paths in this thread [1]. Since has_unmovable_pages() was only used > > > in set_migratetype_isolate() and is_pageblock_removable_nolock(). Only > > > the former will set the REPORT_FAILURE flag which will call printk(). > > > Hence, unlock the zone->lock just before the dump_page() there where > > > when has_unmovable_pages() returns true, there is no need to hold the > > > lock anyway in the rest of set_migratetype_isolate(). > > > > > > While at it, remove a problematic printk() in __offline_isolated_pages() > > > only for debugging as well which will always disable lockdep on debug > > > kernels. > > > > I do not think that removing the printk is the right long term solution. > > While I do agree that removing the debugging printk __offline_isolated_pages > > does make sense because it is essentially of a very limited use, this > > doesn't really solve the underlying problem. There are likely other > > printks from zone->lock. It would be much more saner to actually > > disallow consoles to allocate any memory while printk is called from an > > atomic context. > > The current "standard" solution for these situations is to replace > the problematic printk() with printk_deferred(). It would deffer > the console handling. > > Of course, this is a whack a mole approach. The long term solution > is to deffer printk() by default. We have finally agreed on this > few weeks ago on Plumbers conference. It is going to be added > together with fully lockless log buffer hopefully soon. It will > be part of upstreaming Real-Time related code. Does this guarantee that if, lock(zone->lock) printk_deferred() unlock(zone->lock) that the locks (console_owner and console_sem) in printk_deferred() will always be processed by the unlock(zone->lock)? If it is more of timing thing where klogd wakes up, it could still end up with, zone_lock -> console_owner/console_sem that causes a deadlock.