On Wed, 2019-10-09 at 13:49 +0200, Petr Mladek wrote: > On Tue 2019-10-08 15:35:24, Qian Cai wrote: > > On Tue, 2019-10-08 at 21:17 +0200, Michal Hocko wrote: > > > On Tue 08-10-19 15:06:13, Qian Cai wrote: > > > > On Tue, 2019-10-08 at 20:35 +0200, Michal Hocko wrote: > > > > > > [...] > > > > > I fully agree that this class of lockdep splats are annoying especially > > > > > when they make the lockdep unusable but please discuss this with lockdep > > > > > maintainers and try to find some solution rather than go and try to > > > > > workaround the problem all over the place. If there are places that > > > > > would result in a cleaner code then go for it but please do not make the > > > > > code worse just because of a non existent problem flagged by a false > > > > > positive. > > > > > > > > It makes me wonder what make you think it is a false positive for sure. > > > > > > Because this is an early init code? Because if it were a real deadlock > > > > No, that alone does not prove it is a false positive. There are many places > > could generate that lock dependency but lockdep will always save the earliest > > one. > > You are still mixing the pasted lockdep report and other situations > that will not get reported because of the first one. The lockdep report is designed to only just give a clue on the existing locking dependency. Then, it is normal that developers need to audit all places of that particular locking dependency to confirm it is a true false positive. > > We believe that the pasted report is pasted is false positive. And we > are against complicating the code just to avoid this false positive. This is totally incorrect as above and there is even another similar example of the splat during memory offline I mentioned earlier, https://lore.kernel.org/linux-mm/1570460350.5576.290.camel@xxxxxx/ [ 297.425964] -> #1 (&port_lock_key){-.-.}: [ 297.425967] __lock_acquire+0x5b3/0xb40 [ 297.425967] lock_acquire+0x126/0x280 [ 297.425968] _raw_spin_lock_irqsave+0x3a/0x50 [ 297.425969] serial8250_console_write+0x3e4/0x450 [ 297.425970] univ8250_console_write+0x4b/0x60 [ 297.425970] console_unlock+0x501/0x750 [ 297.425971] vprintk_emit+0x10d/0x340 [ 297.425972] vprintk_default+0x1f/0x30 [ 297.425972] vprintk_func+0x44/0xd4 [ 297.425973] printk+0x9f/0xc5 [ 297.425974] register_console+0x39c/0x520 [ 297.425975] univ8250_console_init+0x23/0x2d [ 297.425975] console_init+0x338/0x4cd [ 297.425976] start_kernel+0x534/0x724 [ 297.425977] x86_64_start_reservations+0x24/0x26 [ 297.425977] x86_64_start_kernel+0xf4/0xfb [ 297.425978] secondary_startup_64+0xb6/0xc0 where the report again show the early boot call trace for the locking dependency, console_owner --> port_lock_key but that dependency clearly not only happen in the early boot. I agree with you that it is hard to hit because it needs 4 CPUs to hit the exact the same spot. > > Are you sure that the workaround will not create real deadlocks > or races? > > I see two realistic possibilities: > > + Make printk() always deferred. This will hopefully happen soon. > > + Improve lockdep so that false positives could get ignored > without complicating the code. There are certainly rooms to improve the lockdep but it does not help in this case as mentioned above.