[off topic: plain text mail please] On Fri, 9 Aug 2019 12:41:42 +0000 Martin Wilck wrote: > > This happened to me today, running kernel 5.3.0-rc3-1.g571863b-default > (5.3-rc3 with just a few patches on top), after starting a KVM virtual > machine. The X screen was frozen. Remote login via ssh was still > possible, thus I was able to retrieve basic logs. Thanks for report. > > sysrq-w showed two blocked processes (kcompactd0 and KVM). After a > minute, the same two processes were still blocked. KVM seems to try to > acquire a lock that kcompactd is holding. kcompactd is waiting for IO > to complete on pages owned by the i915 driver. > > kcompactd stack: > > Aug 09 12:12:48 apollon.suse.de kernel: sysrq: Show Blocked State > Aug 09 12:12:48 apollon.suse.de kernel: task PC stack pid father > Aug 09 12:12:48 apollon.suse.de kernel: kcompactd0 D 0 43 2 0x80004000 > Aug 09 12:12:48 apollon.suse.de kernel: Call Trace: > Aug 09 12:12:48 apollon.suse.de kernel: ? __schedule+0x2af/0x6a0 > Aug 09 12:12:48 apollon.suse.de kernel: schedule+0x33/0x90 > Aug 09 12:12:48 apollon.suse.de kernel: io_schedule+0x12/0x40 > Aug 09 12:12:48 apollon.suse.de kernel: __lock_page+0x123/0x200 > Aug 09 12:12:48 apollon.suse.de kernel: ? gen8_ppgtt_clear_pdp+0xc0/0x140 [i915] > Aug 09 12:12:48 apollon.suse.de kernel: ? file_fdatawait_range+0x20/0x20 > Aug 09 12:12:48 apollon.suse.de kernel: set_page_dirty_lock+0x49/0x50 > Aug 09 12:12:48 apollon.suse.de kernel: i915_gem_userptr_put_pages+0x13f/0x1c0 [i915] The two lines above show commit aa56a292ce62 ("drm/i915/userptr: Acquire the page lock around set_page_dirty()") is culprit. > Aug 09 12:12:48 apollon.suse.de kernel: __i915_gem_object_put_pages+0x5e/0xa0 [i915] > Aug 09 12:12:48 apollon.suse.de kernel: userptr_mn_invalidate_range_start+0x1ff/0x220 [i915] > Aug 09 12:12:48 apollon.suse.de kernel: __mmu_notifier_invalidate_range_start+0x57/0xa0 > Aug 09 12:12:48 apollon.suse.de kernel: try_to_unmap_one+0xa0b/0xae0 > Aug 09 12:12:48 apollon.suse.de kernel: ? __mod_lruvec_state+0x3f/0xf0 > Aug 09 12:12:48 apollon.suse.de kernel: rmap_walk_file+0xf2/0x250 > Aug 09 12:12:48 apollon.suse.de kernel: try_to_unmap+0xa6/0xe0 Page is locked before try_to_unmap(), and dirty page table entry is handled in try_to_unmap_one(), so what was added in aa56a292ce62 is a bit of overaction in this call trace. A bigger pain is it can not be reverted because of the Fixes tag in it. > Aug 09 12:12:48 apollon.suse.de kernel: ? page_remove_rmap+0x290/0x290 > Aug 09 12:12:48 apollon.suse.de kernel: ? page_not_mapped+0x20/0x20 > Aug 09 12:12:48 apollon.suse.de kernel: ? page_get_anon_vma+0x80/0x80 > Aug 09 12:12:48 apollon.suse.de kernel: migrate_pages+0x8cd/0xbc0 > Aug 09 12:12:48 apollon.suse.de kernel: ? fast_isolate_freepages+0x6b0/0x6b0 > Aug 09 12:12:48 apollon.suse.de kernel: ? move_freelist_tail+0xb0/0xb0 > Aug 09 12:12:48 apollon.suse.de kernel: compact_zone+0x669/0xc80 > Aug 09 12:12:48 apollon.suse.de kernel: ? entry_SYSCALL_64_after_hwframe+0xb8/0xbe > Aug 09 12:12:48 apollon.suse.de kernel: kcompactd_do_work+0x120/0x290 > > KVM stack: > > Aug 09 12:12:48 apollon.suse.de kernel: CPU 0/KVM D 0 25189 1 0x00000320 > Aug 09 12:12:48 apollon.suse.de kernel: Call Trace: > Aug 09 12:12:48 apollon.suse.de kernel: ? __schedule+0x2af/0x6a0 > Aug 09 12:12:48 apollon.suse.de kernel: schedule+0x33/0x90 > Aug 09 12:12:48 apollon.suse.de kernel: schedule_preempt_disabled+0xa/0x10 > Aug 09 12:12:48 apollon.suse.de kernel: __mutex_lock.isra.0+0x172/0x4d0 > Aug 09 12:12:48 apollon.suse.de kernel: userptr_mn_invalidate_range_start+0x1bf/0x220 [i915] > Aug 09 12:12:48 apollon.suse.de kernel: __mmu_notifier_invalidate_range_start+0x57/0xa0 > Aug 09 12:12:48 apollon.suse.de kernel: try_to_unmap_one+0xa0b/0xae0 > Aug 09 12:12:48 apollon.suse.de kernel: rmap_walk_file+0xf2/0x250 > Aug 09 12:12:48 apollon.suse.de kernel: try_to_unmap+0xa6/0xe0 > Aug 09 12:12:48 apollon.suse.de kernel: ? page_remove_rmap+0x290/0x290 > Aug 09 12:12:48 apollon.suse.de kernel: ? page_not_mapped+0x20/0x20 > Aug 09 12:12:48 apollon.suse.de kernel: ? page_get_anon_vma+0x80/0x80 > Aug 09 12:12:48 apollon.suse.de kernel: migrate_pages+0x8cd/0xbc0 > Aug 09 12:12:48 apollon.suse.de kernel: ? fast_isolate_freepages+0x6b0/0x6b0 > Aug 09 12:12:48 apollon.suse.de kernel: ? move_freelist_tail+0xb0/0xb0 > Aug 09 12:12:48 apollon.suse.de kernel: compact_zone+0x669/0xc80 > Aug 09 12:12:48 apollon.suse.de kernel: compact_zone_order+0xc6/0xf0 > Aug 09 12:12:48 apollon.suse.de kernel: try_to_compact_pages+0xcc/0x2a0 > Aug 09 12:12:48 apollon.suse.de kernel: __alloc_pages_direct_compact+0x7c/0x150 > Aug 09 12:12:48 apollon.suse.de kernel: __alloc_pages_slowpath+0x1ee/0xd00 > Aug 09 12:12:48 apollon.suse.de kernel: ? vmx_vcpu_load+0x100/0x120 [kvm_intel] > > Full logs can be found under https://pastebin.com/KJ6tccj4 > I haven't yet tried if this is reproducible. Set page dirty unless someone else is taking care of it. --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -663,7 +663,7 @@ i915_gem_userptr_put_pages(struct drm_i9 i915_gem_gtt_finish_pages(obj, pages); for_each_sgt_page(page, sgt_iter, pages) { - if (obj->mm.dirty) + if (obj->mm.dirty) { /* * As this may not be anonymous memory (e.g. shmem) * but exist on a real mapping, we have to lock @@ -672,8 +672,15 @@ i915_gem_userptr_put_pages(struct drm_i9 * prevent the inode from being truncated. * Play safe and take the lock. */ - set_page_dirty_lock(page); - + if (trylock_page(page)) { + set_page_dirty(page); + unlock_page(page); + } + /* + * else someone else is taking care of page and + * we can do nothing about it to avoid deadlock + */ + } mark_page_accessed(page); put_page(page); } -- _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx