Re: 3.2-rc1 and nvidia drivers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/16/2011 04:06 PM, Thomas Gleixner wrote:
On Wed, 16 Nov 2011, Thomas Schauss wrote:
Unfortunately, with 3.0-rt and the nvidia-driver we get complete system
freezes when starting X on several different hardware setups (a few systems
work fine). This is certainly caused by this combination. When using the
nouveau-driver everything works fine.

Have you ever tried to run with CONFIG_PROVE_LOCKING=y ?


Hello,

thank you for that tip. I have tried this now and have not found any warnings which seem related to the nvidia-driver. Further testing revealed, that the driver works fine with CONFIG_PREEMPT_RTB and the freezes when running startx occur as soon as we switch to CONFIG_PREEMPT_RT_FULL.

Regarding lockdep, we do get some warnings in slab.c -> cache_flusharray that however seem unrelated to nvidia. As we could not find any other bugs with the same locking warning I attached one example below. You can find some complete bootlogs (all with deadlock-warnings, all with slightly different call-stack) and my kernel-config at

http://www.lsr.ei.tum.de/team/schauss/lockdep/

On rt-base I also get a lockdep-warning which however seems unrelated to the rt-full one (not in cache_flusharray). You can find that log on the same page.

Best Regards,
Thomas



Nov 17 17:34:49 fix kernel: [ 30.750925] ============================================= Nov 17 17:34:49 fix kernel: [ 30.750927] [ INFO: possible recursive locking detected ]
Nov 17 17:34:49 fix kernel: [   30.750930] 3.0.9-25-rt #0
Nov 17 17:34:49 fix kernel: [ 30.750931] --------------------------------------------- Nov 17 17:34:49 fix kernel: [ 30.750933] udevd/517 is trying to acquire lock: Nov 17 17:34:49 fix kernel: [ 30.750935] (&parent->list_lock){+.+...}, at: [<ffffffff81613e63>] cache_flusharray+0x47/0xd6
Nov 17 17:34:49 fix kernel: [   30.750944]
Nov 17 17:34:49 fix kernel: [   30.750945] but task is already holding lock:
Nov 17 17:34:49 fix kernel: [ 30.750946] (&parent->list_lock){+.+...}, at: [<ffffffff81613e63>] cache_flusharray+0x47/0xd6
Nov 17 17:34:49 fix kernel: [   30.750950]
Nov 17 17:34:49 fix kernel: [ 30.750951] other info that might help us debug this: Nov 17 17:34:49 fix kernel: [ 30.750952] Possible unsafe locking scenario:
Nov 17 17:34:49 fix kernel: [   30.750953]
Nov 17 17:34:49 fix kernel: [   30.750954]        CPU0
Nov 17 17:34:49 fix kernel: [   30.750955]        ----
Nov 17 17:34:49 fix kernel: [   30.750956]   lock(&parent->list_lock);
Nov 17 17:34:49 fix kernel: [   30.750958]   lock(&parent->list_lock);
Nov 17 17:34:49 fix kernel: [   30.750959]
Nov 17 17:34:49 fix kernel: [   30.750960]  *** DEADLOCK ***
Nov 17 17:34:49 fix kernel: [   30.750961]
Nov 17 17:34:49 fix kernel: [ 30.750962] May be due to missing lock nesting notation
Nov 17 17:34:49 fix kernel: [   30.750963]
Nov 17 17:34:49 fix kernel: [   30.750964] 2 locks held by udevd/517:
Nov 17 17:34:49 fix kernel: [ 30.750966] #0: (&per_cpu(slab_lock, __cpu).lock){+.+...}, at: [<ffffffff8116a5c6>] kfree+0xd6/0x380 Nov 17 17:34:49 fix kernel: [ 30.750973] #1: (&parent->list_lock){+.+...}, at: [<ffffffff81613e63>] cache_flusharray+0x47/0xd6
Nov 17 17:34:49 fix kernel: [   30.750977]
Nov 17 17:34:49 fix kernel: [   30.750977] stack backtrace:
Nov 17 17:34:49 fix kernel: [ 30.750980] Pid: 517, comm: udevd Not tainted 3.0.9-25-rt #0
Nov 17 17:34:49 fix kernel: [   30.750982] Call Trace:
Nov 17 17:34:49 fix kernel: [ 30.750987] [<ffffffff810a0097>] print_deadlock_bug+0xf7/0x100 Nov 17 17:34:49 fix kernel: [ 30.750991] [<ffffffff810a1add>] validate_chain.isra.37+0x67d/0x720 Nov 17 17:34:49 fix kernel: [ 30.750995] [<ffffffff810a2478>] __lock_acquire+0x478/0x9c0 Nov 17 17:34:49 fix kernel: [ 30.750999] [<ffffffff8162ae19>] ? sub_preempt_count+0x29/0x60 Nov 17 17:34:49 fix kernel: [ 30.751003] [<ffffffff81627475>] ? _raw_spin_unlock+0x35/0x60 Nov 17 17:34:49 fix kernel: [ 30.751007] [<ffffffff81625f0b>] ? rt_spin_lock_slowlock+0x2eb/0x340 Nov 17 17:34:49 fix kernel: [ 30.751011] [<ffffffff81056be1>] ? get_parent_ip+0x11/0x50 Nov 17 17:34:49 fix kernel: [ 30.751014] [<ffffffff81613e63>] ? cache_flusharray+0x47/0xd6 Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff810a2f64>] lock_acquire+0x94/0x160 Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff81613e63>] ? cache_flusharray+0x47/0xd6 Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff81626999>] rt_spin_lock+0x39/0x40 Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff81613e63>] ? cache_flusharray+0x47/0xd6 Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff8105a90b>] ? migrate_disable+0x6b/0xe0 Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff81613e63>] cache_flusharray+0x47/0xd6 Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff81167a41>] kmem_cache_free+0x221/0x300 Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff81167b8f>] slab_destroy+0x6f/0xa0 Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff81167d32>] free_block+0x172/0x190 Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff81613eb4>] cache_flusharray+0x98/0xd6 Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff814f1110>] ? __sk_free+0x130/0x160 Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff814f1110>] ? __sk_free+0x130/0x160 Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff8116a806>] kfree+0x316/0x380 Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff814f5328>] ? skb_queue_purge+0x28/0x40 Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff814f1110>] __sk_free+0x130/0x160 Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff814f11d5>] sk_free+0x25/0x30 Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff8152d908>] netlink_release+0x128/0x200 Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff814ea388>] sock_release+0x28/0x90 Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff814eaa57>] sock_close+0x17/0x30 Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff8117b914>] __fput+0xb4/0x200 Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff8117ba85>] fput+0x25/0x30 Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff81177d0c>] filp_close+0x6c/0x90 Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff81177df0>] sys_close+0xc0/0x130 Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff8162ed02>] system_call_fastpath+0x16/0x1b
begin:vcard
fn:Thomas Schauss
n:Schauss;Thomas
org:Technische Universitaet Muenchen (TUM);Institute of Automatic Control Engineering (LSR)
adr:;;Theresienstr. 90;Munich;;80333;Germany
email;internet:schauss@xxxxxx
title:Dipl.-Ing. (Univ.)
tel;work:+49 89 289 23406
tel;fax:+49 89 289 28340
url:http://www.lsr.ei.tum.de
version:2.1
end:vcard


[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux