Hello, Daniel, Thank you for response. I've got a couple questions for you: 0. What do you think about making integration test with continuous reloading i915 driver and X server (with FLR between iteration)? Simplified example for ubuntu (root required): #!/bin/bash service lightdm stop rmmod snd_hda_intel echo -n "0000:00:02.0" > /sys/bus/pci/devices/0000\:00\:02.0/driver/unbind rmmod i915 echo 1 > /sys/bus/pci/devices/0000\:00\:02.0/reset modprobe i915 service lightdm start This will uncover next problems: * sysfs add/remove i2c connectors (parent/child warning) * drm static buffer races * NX bit violation crash (see dump below) * and bunch of DMAR problems 1. Could you point me out git/branch with FIXME comments? 2. About kfree() problem: what if put string buffer onto stack of caller for drm_get_connector_name and drm_get_encoder_name? It will be auto-removed and there is will be the patch about adding new param for these functions. (yes the patch will be big and awful to read) ======================= NX bit violation crash ======================================== Mar 20 21:53:29 haswell01 kernel: [13447.100849] Console: switching to colour dummy device 80x25 Mar 20 21:53:29 haswell01 kernel: [13447.100950] drm_kms_helper: drm: unregistered panic notifier Mar 20 21:53:29 haswell01 kernel: [13447.117785] waiting module removal not supported: please upgrade Mar 20 21:53:29 haswell01 kernel: [13447.117880] [drm] Module unloaded Mar 20 21:53:29 haswell01 kernel: [13447.118244] waiting module removal not supported: please upgrade Mar 20 21:53:29 haswell01 kernel: [13447.118360] waiting module removal not supported: please upgradewaiting module removal not supported: please upgrade Mar 20 21:53:39 haswell01 kernel: [13447.118590] waiting module removal not supported: please upgrade<6>[13457.263808] [drm] Initialized drm 1.1.0 20060810 Mar 20 21:53:39 haswell01 kernel: [13457.333992] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) Mar 20 21:53:39 haswell01 kernel: [13457.333996] BUG: unable to handle kernel paging request at ffff8803eb27fcb0 Mar 20 21:53:39 haswell01 kernel: [13457.333998] IP: [<ffff8803eb27fcb0>] 0xffff8803eb27fcb0 Mar 20 21:53:39 haswell01 kernel: [13457.334001] PGD 2fd9067 PUD 3e68c3063 PMD 404902063 PTE 80000003eb27f163 Mar 20 21:53:39 haswell01 kernel: [13457.334004] Oops: 0011 [#1] SMP Mar 20 21:53:39 haswell01 kernel: [13457.334006] Modules linked in: i915(+) video drm_kms_helper drm i2c_algo_bit snd_hda_codec_hdmi rfcomm bnep bluetooth nls_iso8859_1 x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_realtek kvm snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq hid_generic crct10dif_pclmul snd_seq_device usbhid crc32_pclmul snd_timer ghash_clmulni_intel snd aesni_intel psmouse hid mei_me aes_x86_64 lrw ppdev gf128mul mei glue_helper parport_pc ablk_helper lp cryptd soundcore parport serio_raw lpc_ich mac_hid e1000e ahci ptp libahci pps_core [last unloaded: video] Mar 20 21:53:39 haswell01 kernel: [13457.334031] CPU: 0 PID: 4974 Comm: modprobe Not tainted 3.13.6 #10 Mar 20 21:53:39 haswell01 kernel: [13457.334032] Hardware name: /DQ87PG, BIOS PGQ8710H.86A.0144.2014.0113.1604 01/13/2014 Mar 20 21:53:39 haswell01 kernel: [13457.334034] task: ffff88002e6c5fc0 ti: ffff880406394000 task.ti: ffff880406394000 Mar 20 21:53:39 haswell01 kernel: [13457.334035] RIP: 0010:[<ffff8803eb27fcb0>] [<ffff8803eb27fcb0>] 0xffff8803eb27fcb0 Mar 20 21:53:39 haswell01 kernel: [13457.334037] RSP: 0018:ffff880406395af0 EFLAGS: 00010282 Mar 20 21:53:39 haswell01 kernel: [13457.334038] RAX: ffff8803eb27f4b0 RBX: ffff8803f6016000 RCX: ffffffffa0238630 Mar 20 21:53:39 haswell01 kernel: [13457.334039] RDX: ffff8803eb27fcb0 RSI: ffffffffa03ba3c4 RDI: ffff8803f6016000 Mar 20 21:53:39 haswell01 kernel: [13457.334040] RBP: ffff880406395b10 R08: 0000000000000000 R09: ffff88041ea172e0 Mar 20 21:53:39 haswell01 kernel: [13457.334042] R10: ffffea00101d7200 R11: 0000000000000000 R12: 0000000000000000 Mar 20 21:53:39 haswell01 kernel: [13457.334043] R13: ffffffffa03ba3c4 R14: ffffffffa03dd100 R15: 0000000000000000 Mar 20 21:53:39 haswell01 kernel: [13457.334044] FS: 00007f3f392d0740(0000) GS:ffff88041ea00000(0000) knlGS:0000000000000000 Mar 20 21:53:39 haswell01 kernel: [13457.334046] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 20 21:53:39 haswell01 kernel: [13457.334047] CR2: ffff8803eb27fcb0 CR3: 00000003e7358000 CR4: 00000000001407f0 Mar 20 21:53:39 haswell01 kernel: [13457.334048] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 20 21:53:39 haswell01 kernel: [13457.334050] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Mar 20 21:53:39 haswell01 kernel: [13457.334051] Stack: Mar 20 21:53:39 haswell01 kernel: [13457.334052] ffffffffa0213cb2 ffff8803f6016000 ffff880408199000 ffff880408199098 Mar 20 21:53:39 haswell01 kernel: [13457.334054] ffff880406395b60 ffffffffa0215b92 0000000000000004 ffff880408199140 Mar 20 21:53:39 haswell01 kernel: [13457.334057] ffffffffa03b9920 ffff880408199000 0000000000000000 ffffffffa03dd000 Mar 20 21:53:39 haswell01 kernel: [13457.334059] Call Trace: Mar 20 21:53:39 haswell01 kernel: [13457.334067] [<ffffffffa0213cb2>] ? drm_dev_register+0xa2/0x1e0 [drm] Mar 20 21:53:39 haswell01 kernel: [13457.334073] [<ffffffffa0215b92>] drm_get_pci_dev+0x92/0x140 [drm] Mar 20 21:53:39 haswell01 kernel: [13457.334082] [<ffffffffa033d67c>] i915_pci_probe+0x3c/0x90 [i915] Mar 20 21:53:39 haswell01 kernel: [13457.334086] [<ffffffff813982b5>] local_pci_probe+0x45/0xa0 Mar 20 21:53:39 haswell01 kernel: [13457.334088] [<ffffffff81399555>] ? pci_match_device+0xc5/0xd0 Mar 20 21:53:39 haswell01 kernel: [13457.334090] [<ffffffff81399679>] pci_device_probe+0xd9/0x130 Mar 20 21:53:39 haswell01 kernel: [13457.334093] [<ffffffff81484795>] driver_probe_device+0x125/0x3b0 Mar 20 21:53:39 haswell01 kernel: [13457.334095] [<ffffffff81484af3>] __driver_attach+0x93/0xa0 Mar 20 21:53:39 haswell01 kernel: [13457.334098] [<ffffffff81484a60>] ? __device_attach+0x40/0x40 Mar 20 21:53:39 haswell01 kernel: [13457.334100] [<ffffffff81482703>] bus_for_each_dev+0x63/0xa0 Mar 20 21:53:39 haswell01 kernel: [13457.334102] [<ffffffff8148414e>] driver_attach+0x1e/0x20 Mar 20 21:53:39 haswell01 kernel: [13457.334104] [<ffffffff81483d30>] bus_add_driver+0x180/0x250 Mar 20 21:53:39 haswell01 kernel: [13457.334106] [<ffffffffa03fe000>] ? 0xffffffffa03fdfff Mar 20 21:53:39 haswell01 kernel: [13457.334109] [<ffffffff81485174>] driver_register+0x64/0xf0 Mar 20 21:53:39 haswell01 kernel: [13457.334110] [<ffffffffa03fe000>] ? 0xffffffffa03fdfff Mar 20 21:53:39 haswell01 kernel: [13457.334113] [<ffffffff81397c4c>] __pci_register_driver+0x4c/0x50 Mar 20 21:53:39 haswell01 kernel: [13457.334117] [<ffffffffa0215d5a>] drm_pci_init+0x11a/0x130 [drm] Mar 20 21:53:39 haswell01 kernel: [13457.334119] [<ffffffffa03fe000>] ? 0xffffffffa03fdfff Mar 20 21:53:39 haswell01 kernel: [13457.334127] [<ffffffffa03fe066>] i915_init+0x66/0x68 [i915] Mar 20 21:53:39 haswell01 kernel: [13457.334130] [<ffffffff8100214a>] do_one_initcall+0xfa/0x1b0 Mar 20 21:53:39 haswell01 kernel: [13457.334133] [<ffffffff810589e3>] ? set_memory_nx+0x43/0x50 Mar 20 21:53:39 haswell01 kernel: [13457.334137] [<ffffffff810e0630>] load_module+0x2050/0x26f0 Mar 20 21:53:39 haswell01 kernel: [13457.334139] [<ffffffff810dbfa0>] ? store_uevent+0x40/0x40 Mar 20 21:53:39 haswell01 kernel: [13457.334141] [<ffffffff810e0e46>] SyS_finit_module+0x86/0xb0 Mar 20 21:53:39 haswell01 kernel: [13457.334144] [<ffffffff8171823f>] tracesys+0xe1/0xe6 Mar 20 21:53:39 haswell01 kernel: [13457.334145] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 <f0> b1 ca 81 ff ff ff ff b0 f4 27 eb 03 88 ff ff ff ff ff 7f 00 Mar 20 21:53:39 haswell01 kernel: [13457.334164] RIP [<ffff8803eb27fcb0>] 0xffff8803eb27fcb0 Mar 20 21:53:39 haswell01 kernel: [13457.334166] RSP <ffff880406395af0> Mar 20 21:53:39 haswell01 kernel: [13457.334167] CR2: ffff8803eb27fcb0 Mar 20 21:53:39 haswell01 kernel: [13457.334168] ---[ end trace e2598b1fc83a65fd ]--- ________________________________________ From: Daniel Vetter <daniel.vetter@xxxxxxxx> on behalf of Daniel Vetter <daniel@xxxxxxxx> Sent: Tuesday, March 25, 2014 12:31 AM To: Dmitry Malkin Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx Subject: Re: [DRM] drm_get_connector_name internal static string buffer causes a race On Mon, Mar 24, 2014 at 12:06:21PM +0000, Dmitry Malkin wrote: > > Hello guys, > > I've been playing with reloading intel gfx driver (i915) in a cycle, for a while, > and at some point I've found a non-deterministic kernel crash with a highly-variable > iteration dependency -- 2 to 200 driver reload iterations. > > The apparent race is over the shared internal string buffer in drm_get_connector_name(). > It is mostly harmless, due to its results being mostly used for log output, but in at least one > case -- drm_sysfs_connector_add() -- this leads to a more critical error. > > Race scenario: > > - drm_sysfs_connector_add() > - drm_get_connector_name() > vs. > - anything that generates log messages involving DRM connectors > - drm_get_connector_name() > > (and many other from drm_crtc.c) shares with caller const char* to internal static char buffer. > If something call it from other thread, while main thread strore and use returned pointer it may overwrite connector name. > > Here are we go: registering HDMI connecter (drm_sysfs_connector_add store and use pointer from drm_get_connector_name) > and the same time got VGA connector name down through the stack. (the second thread is upowerd who watch continuously sysfs) Yeah, in my recent kerneldoc series I've noticed this too and added FIXME comments. There's a lot more than just those you've pointed out. The problem is that fixing these will be an awful lot of work since you need to add proper cleanup code (calling kfree()) to all the required places. So a full subsystem wide code audit for every single use-site of these. -Daniel _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel