Re: [debug patch] printk: Add a printk killswitch to robustify NMI watchdog messages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05.06.2011 22:15, Arne Jansen wrote:
> On 05.06.2011 21:44, Ingo Molnar wrote:
>>
>> * Arne Jansen<lists@xxxxxxxxxxxxxx>  wrote:
>>
>>>  From the timing I see I'd guess it has something to do with the
>>> scheduler kicking in during printk. I'm neither familiar with the
>>> printk code nor with the scheduler.
>>
>> Yeah, that's the well-known wake-up of klogd:
>>
>> void console_unlock(void)
>> {
>> ...
>>          up(&console_sem);
>>
>> actually ... that's not the klogd wake-up at all (!). I so suck today
>> at bug analysis :-)
>>
>> It's the console lock()/unlock() sequence, and guess what does it:
>>
>>   drivers/tty/tty_io.c:   console_lock();
>>   drivers/tty/vt/selection.c:     console_lock();
>>
>> and the vt.c code in a dozen places.
>>
>> So maybe it's some sort of tty related memory corruption that was
>> made *visible* via the extra assert that the scheduler is doing? The
>> pi_list is embedded in task struct.
>>
>> This would explain why only printk() triggers it and other wakeup
>> patterns not.
>>
>> Now, i don't really like this theory either. Why is there no other
>> type of corruption? And exactly why did only the task_struct::pi_lock
>> field get corrupted while nearby fields not? Also, none of the fields
>> near pi_lock are even remotely tty related.
> 
> Can lockdep just get confused by the lockdep_off/on calls in printk
> while scheduling is allowed? There aren't many users of lockdep_off().
> 
> I'll can try again tomorrow to get a dump of all logs from the
> watchdog, but enough for today...

I just let it dump the locks in debug_show_all_locks, even though
for some reason debug_locks is false. Don't know if the result is
helpful in any way, as it might well be inaccurate.

INFO: lockdep is turned off.

Showing all locks held in the system:
2 locks held by syslog-ng/21624:
 #0:  (&tty->atomic_write_lock){+.+.+.}, at: [<ffffffff8142ade3>]
tty_write_lock+0x23/0x60
 #1:  (&tty->output_lock){+.+...}, at: [<ffffffff8142ee7a>]
n_tty_write+0x14a/0x490
1 lock held by agetty/22174:
 #0:  (&tty->atomic_read_lock){+.+...}, at: [<ffffffff8142fb86>]
n_tty_read+0x5f6/0x8e0
1 lock held by agetty/22175:
 #0:  (&tty->atomic_read_lock){+.+...}, at: [<ffffffff8142fb86>]
n_tty_read+0x5f6/0x8e0
1 lock held by agetty/22176:
 #0:  (&tty->atomic_read_lock){+.+...}, at: [<ffffffff8142fb86>]
n_tty_read+0x5f6/0x8e0
1 lock held by agetty/22177:
 #0:  (&tty->atomic_read_lock){+.+...}, at: [<ffffffff8142fb86>]
n_tty_read+0x5f6/0x8e0
1 lock held by agetty/22178:
 #0:  (&tty->atomic_read_lock){+.+...}, at: [<ffffffff8142fb86>]
n_tty_read+0x5f6/0x8e0
1 lock held by agetty/22179:
 #0:  (&tty->atomic_read_lock){+.+...}, at: [<ffffffff8142fb86>]
n_tty_read+0x5f6/0x8e0
1 lock held by agetty/22180:
 #0:  (&tty->atomic_read_lock){+.+...}, at: [<ffffffff8142fb86>]
n_tty_read+0x5f6/0x8e0
1 lock held by tail/22197:
 #0:  (&rq->lock){-.-.-.}, at: [<ffffffff8185ae42>] schedule+0xe2/0x940

Some more fact that might help figure out what happens:
 - I nearly always either see all 10000 messages or only 10.
   Never 9, never 11. I saw 40 once, and once 190.
 - If I printk only 1000 lines instead of 10000, nothing bad happens
 - If /var/log/syslog is not filled with binary garbage, I also just
   see the 10 lines.

-Arne
--
To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Stable Commits]     [Linux Stable Kernel]     [Linux Kernel]     [Linux USB Devel]     [Linux Video &Media]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux