On Thu, Apr 6, 2017 at 8:01 PM, Thomas Hellstrom <thellstrom@xxxxxxxxxx> wrote: > On 04/06/2017 04:46 PM, Daniel Vetter wrote: >> On Thu, Apr 6, 2017 at 4:10 PM, Thomas Hellstrom <thellstrom@xxxxxxxxxx> wrote: >>> On 04/06/2017 02:34 PM, Daniel Vetter wrote: >>>> Hi Thomas, >>>> >>>> Bisected an offender already? Afaik there's no one else who reported >>>> issues thus far, and for our own CI it seems all still fine. >>>> -Daniel >>> Hi, Daniel, >>> >>> Yes, I rebased drm-misc-next on top of vmwgfx-next and found the culprit >>> to be >>> >>> 38b6441e "drm/atomic-helper: Remove the backoff hack from set_config.." >>> >>> Reverting first 1fa4da04 and then >>> 38b6441e >>> >>> fixes the problem. >> Yeah, we seem to have a solid functional conflict between the vmwgfx >> atomic conversion, and the changes in drm-misc-next. Preliminary >> analysis, but I think what's going on is: >> - With the above changes in -misc we punt the deadlock retry loop to >> the callers of ->set_config. >> - But since it would have been way too invasive, I only fixed up the >> atomic callers (in most places we have special paths for atomic and >> non-atomic due to slightly different semantics), which means for >> legacy functions we in some cases pass a NULL ctx down to >> ->set_config. But since legacy paths only get called on legacy >> drivers, no problem. >> - Well except I've done that audit before vmwgfx became atomic, and >> that audit is now wrong, and I've forgotten to properly re-audit when >> the conflicts happened all around. But since I half-expect to hit a >> mid-driver conversion with this I did sprinkle >> WARN_ON(drm_drv_uses_atomic_modeset()) over all these paths. >> >> So assuming this is correct, you should see a pile of WARN_ON >> backtraces that you're hitting in the atomic-vmwgfx+drm-misc-next >> combo. The proper fix would be to switch over to atomic primitives for >> all these cases. On a quick look I see some in the vmwgfx fbdev >> emulation code, might even be worth it to check whether we could reuse >> the core helpers (which do this split handling alread) in some cases. >> >> Cheers, Daniel > > So with the two reverts previously mentioned applied, I see the > following. Is this consistent with the above. > > FWIW I did a pretty big vmwgfx fbdev rewrite some time ago, but at that > time we didn't have the callbacks > necessary to use the helpers. Maybe that has changed with the atomic > implementation. > > Considering that Sinclair just had a baby, I'm not 100% sure though, > that I have time to fix this up in the vmwgfx driver for this merge > window... > > /Thomas > > > [ 9.547101] WARNING: CPU: 3 PID: 359 at > drivers/gpu/drm/drm_modeset_lock.c:107 drm_modeset_lock_all+0xb8/0xc0 [drm] > [ 9.547102] Modules linked in: snd_rawmidi snd_timer > ghash_clmulni_intel intel_rapl_perf ppdev snd_seq_device vmw_balloon snd > rfkill joydev soundcore nfit parport_pc parport acpi_cpufreq tpm_tis > tpm_tis_core tpm shpchp vmw_vmci i2c_piix4 nfsd auth_rpcgss nfs_acl > lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi > scsi_transport_spi mptscsih crc32c_intel e1000 mptbase ata_generic > serio_raw pata_acpi uas usb_storage > [ 9.547122] CPU: 3 PID: 359 Comm: plymouthd Tainted: G W > 4.11.0-rc4+ #2 > [ 9.547122] Hardware name: VMware, Inc. VMware Virtual Platform/440BX > Desktop Reference Platform, BIOS 6.00 01/24/2017 > [ 9.547123] Call Trace: > [ 9.547128] dump_stack+0x63/0x86 > [ 9.547130] __warn+0xcb/0xf0 > [ 9.547131] warn_slowpath_null+0x1d/0x20 > [ 9.547137] drm_modeset_lock_all+0xb8/0xc0 [drm] > [ 9.547143] vmw_framebuffer_dmabuf_dirty+0x4c/0x200 [vmwgfx] > [ 9.547145] ? __check_object_size+0x100/0x19d > [ 9.547152] drm_mode_dirtyfb_ioctl+0x178/0x1a0 [drm] > [ 9.547158] drm_ioctl+0x209/0x4c0 [drm] > [ 9.547164] ? drm_mode_getfb+0x100/0x100 [drm] > [ 9.547165] ? __do_fault+0x1e/0x110 > [ 9.547169] vmw_generic_ioctl+0x193/0x2d0 [vmwgfx] > [ 9.547175] ? drm_getunique+0xa0/0xa0 [drm] > [ 9.547179] vmw_unlocked_ioctl+0x15/0x20 [vmwgfx] > [ 9.547180] do_vfs_ioctl+0xa3/0x5f0 > [ 9.547181] SyS_ioctl+0x79/0x90 > [ 9.547182] do_syscall_64+0x67/0x180 > [ 9.547184] entry_SYSCALL64_slow_path+0x25/0x25 > [ 9.547185] RIP: 0033:0x7fd4c93b7787 > [ 9.547186] RSP: 002b:00007fff17d06b88 EFLAGS: 00000246 ORIG_RAX: > 0000000000000010 > [ 9.547187] RAX: ffffffffffffffda RBX: 0000000000000c80 RCX: > 00007fd4c93b7787 > [ 9.547187] RDX: 00007fff17d06bc0 RSI: 00000000c01864b1 RDI: > 0000000000000009 > [ 9.547188] RBP: 00007fff17d06bc0 R08: 00007fd4c7554000 R09: > 00007fd4ca1e9010 > [ 9.547188] R10: 0000558ffe14ca40 R11: 0000000000000246 R12: > 00000000c01864b1 > [ 9.547188] R13: 0000000000000009 R14: 0000000000000000 R15: > 0000000000000258 > [ 9.547190] ---[ end trace 46a3554c8816a28b ]--- This is an artifact of the two reverts, I've forgotten to properly clear config->acquire_ctx again in the intermediate states. > 4.824456] WARNING: CPU: 2 PID: 359 at drivers/gpu/drm/drm_crtc.c:499 > drm_mode_set_config_internal+0x40/0x50 [drm] > [ 4.824457] Modules linked in: vmwgfx drm_kms_helper ttm drm mptspi > scsi_transport_spi mptscsih crc32c_intel e1000(+) mptbase ata_generic > serio_raw pata_acpi uas usb_storage > [ 4.824467] CPU: 2 PID: 359 Comm: plymouthd Tainted: G W > 4.11.0-rc4+ #2 > [ 4.824468] Hardware name: VMware, Inc. VMware Virtual Platform/440BX > Desktop Reference Platform, BIOS 6.00 01/24/2017 > [ 4.824468] Call Trace: > [ 4.824474] dump_stack+0x63/0x86 > [ 4.824476] __warn+0xcb/0xf0 > [ 4.824477] warn_slowpath_null+0x1d/0x20 > [ 4.824483] drm_mode_set_config_internal+0x40/0x50 [drm] > [ 4.824492] vmw_fb_set_par+0x269/0x580 [vmwgfx] > [ 4.824494] ? selinux_capable+0x20/0x30 > [ 4.824498] ? ttm_mem_global_reserve.constprop.6+0xd6/0x100 [ttm] > [ 4.824503] vmw_fb_on+0x24/0x60 [vmwgfx] > [ 4.824506] vmw_master_drop+0x81/0xc0 [vmwgfx] > [ 4.824511] drm_drop_master+0x21/0x50 [drm] > [ 4.824516] drm_dropmaster_ioctl+0x6c/0x70 [drm] > [ 4.824521] drm_ioctl+0x209/0x4c0 [drm] > [ 4.824526] ? drm_setmaster_ioctl+0xa0/0xa0 [drm] > [ 4.824528] ? do_filp_open+0xa5/0x100 > [ 4.824532] vmw_generic_ioctl+0x193/0x2d0 [vmwgfx] > [ 4.824537] ? drm_getunique+0xa0/0xa0 [drm] > [ 4.824541] vmw_unlocked_ioctl+0x15/0x20 [vmwgfx] > [ 4.824543] do_vfs_ioctl+0xa3/0x5f0 > [ 4.824544] SyS_ioctl+0x79/0x90 > [ 4.824545] do_syscall_64+0x67/0x180 > [ 4.824547] entry_SYSCALL64_slow_path+0x25/0x25 > [ 4.824548] RIP: 0033:0x7fd4c93b7787 > [ 4.824549] RSP: 002b:00007fff17d06d98 EFLAGS: 00000246 ORIG_RAX: > 0000000000000010 > [ 4.824550] RAX: ffffffffffffffda RBX: 0000558ffe145260 RCX: > 00007fd4c93b7787 > [ 4.824550] RDX: 0000000000000000 RSI: 000000000000641f RDI: > 0000000000000009 > [ 4.824551] RBP: 0000000000000000 R08: 00007fd4c967ab98 R09: > 0000000000000005 > [ 4.824551] R10: 0000558ffe145390 R11: 0000000000000246 R12: > 000000000000641f > [ 4.824552] R13: 0000000000000009 R14: 00007fd4c9da78e0 R15: > 0000000000000000 > [ 4.824553] ---[ end trace 46a3554c8816a28a ]--- Yeah, this is the "don't do that" case that I expected. > 19.720064] WARNING: CPU: 0 PID: 1316 at > drivers/gpu/drm/drm_modeset_lock.c:107 drm_modeset_lock_all+0xb8/0xc0 [drm] > [ 19.720065] Modules linked in: xt_CHECKSUM ipt_MASQUERADE > nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns > nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 > xt_conntrack ip_set nfnetlink ebtable_broute bridge stp llc ebtable_nat > ip6table_security ip6table_raw ip6table_mangle ip6table_nat > nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 iptable_security > iptable_raw iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 > nf_nat_ipv4 nf_nat nf_conntrack libcrc32c ebtable_filter ebtables > ip6table_filter ip6_tables vmw_vsock_vmci_transport vsock bnep > snd_seq_midi snd_seq_midi_event snd_ens1371 gameport snd_ac97_codec > crct10dif_pclmul ac97_bus btusb btrtl btbcm btintel snd_seq bluetooth > snd_pcm crc32_pclmul snd_rawmidi snd_timer ghash_clmulni_intel > intel_rapl_perf ppdev snd_seq_device > [ 19.720091] vmw_balloon snd rfkill joydev soundcore nfit parport_pc > parport acpi_cpufreq tpm_tis tpm_tis_core tpm shpchp vmw_vmci i2c_piix4 > nfsd auth_rpcgss nfs_acl lockd grace sunrpc vmwgfx drm_kms_helper ttm > drm mptspi scsi_transport_spi mptscsih crc32c_intel e1000 mptbase > ata_generic serio_raw pata_acpi uas usb_storage > [ 19.720106] CPU: 0 PID: 1316 Comm: Xorg Tainted: G W > 4.11.0-rc4+ #2 > [ 19.720107] Hardware name: VMware, Inc. VMware Virtual Platform/440BX > Desktop Reference Platform, BIOS 6.00 01/24/2017 > [ 19.720107] Call Trace: > [ 19.720113] dump_stack+0x63/0x86 > [ 19.720115] __warn+0xcb/0xf0 > [ 19.720116] warn_slowpath_null+0x1d/0x20 > [ 19.720123] drm_modeset_lock_all+0xb8/0xc0 [drm] > [ 19.720129] drm_mode_gamma_set_ioctl+0x3a/0x180 [drm] > [ 19.720134] drm_ioctl+0x209/0x4c0 [drm] > [ 19.720140] ? drm_mode_crtc_set_gamma_size+0xa0/0xa0 [drm] > [ 19.720151] ? add_wait_queue+0x65/0x80 > [ 19.720158] vmw_generic_ioctl+0x193/0x2d0 [vmwgfx] > [ 19.720163] ? drm_getunique+0xa0/0xa0 [drm] > [ 19.720167] vmw_unlocked_ioctl+0x15/0x20 [vmwgfx] > [ 19.720169] do_vfs_ioctl+0xa3/0x5f0 > [ 19.720170] ? sk_prot_alloc+0x5/0x120 > [ 19.720171] SyS_ioctl+0x79/0x90 > [ 19.720173] entry_SYSCALL_64_fastpath+0x1a/0xa9 > [ 19.720174] RIP: 0033:0x7f9eb9f24787 > [ 19.720175] RSP: 002b:00007ffd90012b88 EFLAGS: 00000246 ORIG_RAX: > 0000000000000010 > [ 19.720176] RAX: ffffffffffffffda RBX: 000000000222ffe0 RCX: > 00007f9eb9f24787 > [ 19.720176] RDX: 00007ffd90012bc0 RSI: 00000000c02064a5 RDI: > 000000000000000c > [ 19.720176] RBP: 00007f9eba1e43c0 R08: 0000000002130fb0 R09: > 00000000021311b0 > [ 19.720177] R10: 0000000000000088 R11: 0000000000000246 R12: > 0000000000000000 > [ 19.720177] R13: 00007f9ebc6822a8 R14: 00007f9eb9f9b5e0 R15: > 00007ffd9000eeb0 > [ 19.720179] ---[ end trace 46a3554c8816a293 ]--- > [ 31.611886] systemd-journald[600]: File > /var/log/journal/fbbc68aec3984fd6b148a9830a1096e0/user-2000.journal > corrupted or uncleanly shut down, renaming and replacing. > [ 31.937861] ------------[ cut here ]------------ This is again the leaked acquire_ctx that isn't properly cleared due to your reverts (well, my not-perfectly-bisectable patches). I think it should be simple to type up a quick patch to make the vmwgfx fbdev code work again, I'll submit that asap. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel