On Oct 7, 2019, at 4:07 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
I do not think that removing the printk is the right long term solution. While I do agree that removing the debugging printk __offline_isolated_pages does make sense because it is essentially of a very limited use, this doesn't really solve the underlying problem. There are likely other printks from zone->lock. It would be much more saner to actually disallow consoles to allocate any memory while printk is called from an atomic context.
No, there is only a handful of places called printk() from zone->lock. It is normal that the callers will quietly process “struct zone” modification in a short section with zone->lock held.
No, it is not about “allocate any memory while printk is called from an atomic context”. It is opposite lock chain from different processors which has the same effect. For example,
CPU0: CPU1: CPU2: console_owner sclp_lock sclp_lock zone_lock zone_lock console_owner
Here it is a deadlock.
The problem is probably there forever, but neither many developers will
run memory offline with the lockdep enabled nor admins in the field are
lucky enough yet to hit a perfect timing which required to trigger a
real deadlock. In addition, there aren't many places that call printk()
while zone->lock was held.
WARNING: possible circular locking dependency detected
------------------------------------------------------
test.sh/1724 is trying to acquire lock:
0000000052059ec0 (console_owner){-...}, at: console_unlock+0x
01: 328/0xa30
but task is already holding lock:
000000006ffd89c8 (&(&zone->lock)->rlock){-.-.}, at: start_iso
01: late_page_range+0x216/0x538
Show Quoted Content The problem is probably there forever, but neither many developers will
run memory offline with the lockdep enabled nor admins in the field are
lucky enough yet to hit a perfect timing which required to trigger a
real deadlock. In addition, there aren't many places that call printk()
while zone->lock was held.
WARNING: possible circular locking dependency detected
------------------------------------------------------
test.sh/1724 is trying to acquire lock:
0000000052059ec0 (console_owner){-...}, at: console_unlock+0x
01: 328/0xa30
but task is already holding lock:
000000006ffd89c8 (&(&zone->lock)->rlock){-.-.}, at: start_iso
01: late_page_range+0x216/0x538
The problem is probably there forever, but neither many developers will
run memory offline with the lockdep enabled nor admins in the field are
lucky enough yet to hit a perfect timing which required to trigger a
real deadlock. In addition, there aren't many places that call printk()
while zone->lock was held.
WARNING: possible circular locking dependency detected
------------------------------------------------------
test.sh/1724 is trying to acquire lock:
0000000052059ec0 (console_owner){-...}, at: console_unlock+0x
01: 328/0xa30
but task is already holding lock:
000000006ffd89c8 (&(&zone->lock)->rlock){-.-.}, at: start_iso
01: late_page_range+0x216/0x538
I am also wondering what does this lockdep report actually say. How comewe have a dependency between a start_kernel path and a syscall?
Petr explained it correctly. |