On Thu, Nov 29, 2018 at 9:46 AM Wentland, Harry <Harry.Wentland@xxxxxxx> wrote: > > On 2018-11-28 7:31 p.m., Guang Bai wrote: > > From: Lyude Paul <lyude@xxxxxxxxxx> > > > > This reverts commit: > > > > c54c7374ff44 ("drm/dp_mst: Skip validating ports during destruction, just ref") > > > > ugh. > > > > In drm_dp_destroy_connector_work(), we have a pretty good chance of > > freeing the actual struct drm_dp_mst_port. However, after destroying > > things we send a hotplug through (*mgr->cbs->hotplug)(mgr) which is > > where the problems start. > > > > For i915, this calls all the way down to the fbcon probing helpers, > > which start trying to access the port in a modeset. > > > > [ 45.062001] ================================================================== > > [ 45.062112] BUG: KASAN: use-after-free in ex_handler_refcount+0x146/0x180 > > [ 45.062196] Write of size 4 at addr ffff8882b4b70968 by task kworker/3:1/53 > > > > [ 45.062325] CPU: 3 PID: 53 Comm: kworker/3:1 Kdump: loaded Tainted: G O 4.20.0-rc4Lyude-Test+ #3 > > [ 45.062442] Hardware name: LENOVO 20BWS1KY00/20BWS1KY00, BIOS JBET71WW (1.35 ) 09/14/2018 > > [ 45.062554] Workqueue: events drm_dp_destroy_connector_work [drm_kms_helper] > > [ 45.062641] Call Trace: > > [ 45.062685] dump_stack+0xbd/0x15a > > [ 45.062735] ? dump_stack_print_info.cold.0+0x1b/0x1b > > [ 45.062801] ? printk+0x9f/0xc5 > > [ 45.062847] ? kmsg_dump_rewind_nolock+0xe4/0xe4 > > [ 45.062909] ? ex_handler_refcount+0x146/0x180 > > [ 45.062970] print_address_description+0x71/0x239 > > [ 45.063036] ? ex_handler_refcount+0x146/0x180 > > [ 45.063095] kasan_report.cold.5+0x242/0x30b > > [ 45.063155] __asan_report_store4_noabort+0x1c/0x20 > > [ 45.063313] ex_handler_refcount+0x146/0x180 > > [ 45.063371] ? ex_handler_clear_fs+0xb0/0xb0 > > [ 45.063428] fixup_exception+0x98/0xd7 > > [ 45.063484] ? raw_notifier_call_chain+0x20/0x20 > > [ 45.063548] do_trap+0x6d/0x210 > > [ 45.063605] ? _GLOBAL__sub_I_65535_1_drm_dp_aux_unregister_devnode+0x2f/0x1c6 [drm_kms_helper] > > [ 45.063732] do_error_trap+0xc0/0x170 > > [ 45.063802] ? _GLOBAL__sub_I_65535_1_drm_dp_aux_unregister_devnode+0x2f/0x1c6 [drm_kms_helper] > > [ 45.063929] do_invalid_op+0x3b/0x50 > > [ 45.063997] ? _GLOBAL__sub_I_65535_1_drm_dp_aux_unregister_devnode+0x2f/0x1c6 [drm_kms_helper] > > [ 45.064103] invalid_op+0x14/0x20 > > [ 45.064162] RIP: 0010:_GLOBAL__sub_I_65535_1_drm_dp_aux_unregister_devnode+0x2f/0x1c6 [drm_kms_helper] > > [ 45.064274] Code: 00 48 c7 c7 80 fe 53 a0 48 89 e5 e8 5b 6f 26 e1 5d c3 48 8d 0e 0f 0b 48 8d 0b 0f 0b 48 8d 0f 0f 0b 48 8d 0f 0f 0b 49 8d 4d 00 <0f> 0b 49 8d 0e 0f 0b 48 8d 08 0f 0b 49 8d 4d 00 0f 0b 48 8d 0b 0f > > [ 45.064569] RSP: 0018:ffff8882b789ee10 EFLAGS: 00010282 > > [ 45.064637] RAX: ffff8882af47ae70 RBX: ffff8882af47aa60 RCX: ffff8882b4b70968 > > [ 45.064723] RDX: ffff8882af47ae70 RSI: 0000000000000008 RDI: ffff8882b788bdb8 > > [ 45.064808] RBP: ffff8882b789ee28 R08: ffffed1056f13db4 R09: ffffed1056f13db3 > > [ 45.064894] R10: ffffed1056f13db3 R11: ffff8882b789ed9f R12: ffff8882af47ad28 > > [ 45.064980] R13: ffff8882b4b70968 R14: ffff8882acd86728 R15: ffff8882b4b75dc8 > > [ 45.065084] drm_dp_mst_reset_vcpi_slots+0x12/0x80 [drm_kms_helper] > > [ 45.065225] intel_mst_disable_dp+0xda/0x180 [i915] > > [ 45.065361] intel_encoders_disable.isra.107+0x197/0x310 [i915] > > [ 45.065498] haswell_crtc_disable+0xbe/0x400 [i915] > > [ 45.065622] ? i9xx_disable_plane+0x1c0/0x3e0 [i915] > > [ 45.065750] intel_atomic_commit_tail+0x74e/0x3e60 [i915] > > [ 45.065884] ? intel_pre_plane_update+0xbc0/0xbc0 [i915] > > [ 45.065968] ? drm_atomic_helper_swap_state+0x88b/0x1d90 [drm_kms_helper] > > [ 45.066054] ? kasan_check_write+0x14/0x20 > > [ 45.066165] ? i915_gem_track_fb+0x13a/0x330 [i915] > > [ 45.066277] ? i915_sw_fence_complete+0xe9/0x140 [i915] > > [ 45.066406] ? __i915_sw_fence_complete+0xc50/0xc50 [i915] > > [ 45.066540] intel_atomic_commit+0x72e/0xef0 [i915] > > [ 45.066635] ? drm_dev_dbg+0x200/0x200 [drm] > > [ 45.066764] ? intel_atomic_commit_tail+0x3e60/0x3e60 [i915] > > [ 45.066898] ? intel_atomic_commit_tail+0x3e60/0x3e60 [i915] > > [ 45.067001] drm_atomic_commit+0xc4/0xf0 [drm] > > [ 45.067074] restore_fbdev_mode_atomic+0x562/0x780 [drm_kms_helper] > > [ 45.067166] ? drm_fb_helper_debug_leave+0x690/0x690 [drm_kms_helper] > > [ 45.067249] ? kasan_check_read+0x11/0x20 > > [ 45.067324] restore_fbdev_mode+0x127/0x4b0 [drm_kms_helper] > > [ 45.067364] ? kasan_check_read+0x11/0x20 > > [ 45.067406] drm_fb_helper_restore_fbdev_mode_unlocked+0x164/0x200 [drm_kms_helper] > > [ 45.067462] ? drm_fb_helper_hotplug_event+0x30/0x30 [drm_kms_helper] > > [ 45.067508] ? kasan_check_write+0x14/0x20 > > [ 45.070360] ? mutex_unlock+0x22/0x40 > > [ 45.073748] drm_fb_helper_set_par+0xb2/0xf0 [drm_kms_helper] > > [ 45.075846] drm_fb_helper_hotplug_event.part.33+0x1cd/0x290 [drm_kms_helper] > > [ 45.078088] drm_fb_helper_hotplug_event+0x1c/0x30 [drm_kms_helper] > > [ 45.082614] intel_fbdev_output_poll_changed+0x9f/0x140 [i915] > > [ 45.087069] drm_kms_helper_hotplug_event+0x67/0x90 [drm_kms_helper] > > [ 45.089319] intel_dp_mst_hotplug+0x37/0x50 [i915] > > [ 45.091496] drm_dp_destroy_connector_work+0x510/0x6f0 [drm_kms_helper] > > [ 45.093675] ? drm_dp_update_payload_part1+0x1220/0x1220 [drm_kms_helper] > > [ 45.095851] ? kasan_check_write+0x14/0x20 > > [ 45.098473] ? kasan_check_read+0x11/0x20 > > [ 45.101155] ? strscpy+0x17c/0x530 > > [ 45.103808] ? __switch_to_asm+0x34/0x70 > > [ 45.106456] ? syscall_return_via_sysret+0xf/0x7f > > [ 45.109711] ? read_word_at_a_time+0x20/0x20 > > [ 45.113138] ? __switch_to_asm+0x40/0x70 > > [ 45.116529] ? __switch_to_asm+0x34/0x70 > > [ 45.119891] ? __switch_to_asm+0x40/0x70 > > [ 45.123224] ? __switch_to_asm+0x34/0x70 > > [ 45.126540] ? __switch_to_asm+0x34/0x70 > > [ 45.129824] process_one_work+0x88d/0x15d0 > > [ 45.133172] ? pool_mayday_timeout+0x850/0x850 > > [ 45.136459] ? pci_mmcfg_check_reserved+0x110/0x128 > > [ 45.139739] ? wake_q_add+0xb0/0xb0 > > [ 45.143010] ? check_preempt_wakeup+0x652/0x1050 > > [ 45.146304] ? worker_enter_idle+0x29e/0x740 > > [ 45.149589] ? __schedule+0x1ec0/0x1ec0 > > [ 45.152937] ? kasan_check_read+0x11/0x20 > > [ 45.156179] ? _raw_spin_lock_irq+0xa3/0x130 > > [ 45.159382] ? _raw_read_unlock_irqrestore+0x30/0x30 > > [ 45.162542] ? kasan_check_write+0x14/0x20 > > [ 45.165657] worker_thread+0x1a5/0x1470 > > [ 45.168725] ? set_load_weight+0x2e0/0x2e0 > > [ 45.171755] ? process_one_work+0x15d0/0x15d0 > > [ 45.174806] ? __switch_to_asm+0x34/0x70 > > [ 45.177645] ? __switch_to_asm+0x40/0x70 > > [ 45.180323] ? __switch_to_asm+0x34/0x70 > > [ 45.182936] ? __switch_to_asm+0x40/0x70 > > [ 45.185539] ? __switch_to_asm+0x34/0x70 > > [ 45.188100] ? __switch_to_asm+0x40/0x70 > > [ 45.190628] ? __schedule+0x7d4/0x1ec0 > > [ 45.193143] ? save_stack+0xa9/0xd0 > > [ 45.195632] ? kasan_check_write+0x10/0x20 > > [ 45.198162] ? kasan_kmalloc+0xc4/0xe0 > > [ 45.200609] ? kmem_cache_alloc_trace+0xdd/0x190 > > [ 45.203046] ? kthread+0x9f/0x3b0 > > [ 45.205470] ? ret_from_fork+0x35/0x40 > > [ 45.207876] ? unwind_next_frame+0x43/0x50 > > [ 45.210273] ? __save_stack_trace+0x82/0x100 > > [ 45.212658] ? deactivate_slab.isra.67+0x3d4/0x580 > > [ 45.215026] ? default_wake_function+0x35/0x50 > > [ 45.217399] ? kasan_check_read+0x11/0x20 > > [ 45.219825] ? _raw_spin_lock_irqsave+0xae/0x140 > > [ 45.222174] ? __lock_text_start+0x8/0x8 > > [ 45.224521] ? replenish_dl_entity.cold.62+0x4f/0x4f > > [ 45.226868] ? __kthread_parkme+0x87/0xf0 > > [ 45.229200] kthread+0x2f7/0x3b0 > > [ 45.231557] ? process_one_work+0x15d0/0x15d0 > > [ 45.233923] ? kthread_park+0x120/0x120 > > [ 45.236249] ret_from_fork+0x35/0x40 > > > > [ 45.240875] Allocated by task 242: > > [ 45.243136] save_stack+0x43/0xd0 > > [ 45.245385] kasan_kmalloc+0xc4/0xe0 > > [ 45.247597] kmem_cache_alloc_trace+0xdd/0x190 > > [ 45.249793] drm_dp_add_port+0x1e0/0x2170 [drm_kms_helper] > > [ 45.252000] drm_dp_send_link_address+0x4a7/0x740 [drm_kms_helper] > > [ 45.254389] drm_dp_check_and_send_link_address+0x1a7/0x210 [drm_kms_helper] > > [ 45.256803] drm_dp_mst_link_probe_work+0x6f/0xb0 [drm_kms_helper] > > [ 45.259200] process_one_work+0x88d/0x15d0 > > [ 45.261597] worker_thread+0x1a5/0x1470 > > [ 45.264038] kthread+0x2f7/0x3b0 > > [ 45.266371] ret_from_fork+0x35/0x40 > > > > [ 45.270937] Freed by task 53: > > [ 45.273170] save_stack+0x43/0xd0 > > [ 45.275382] __kasan_slab_free+0x139/0x190 > > [ 45.277604] kasan_slab_free+0xe/0x10 > > [ 45.279826] kfree+0x99/0x1b0 > > [ 45.282044] drm_dp_free_mst_port+0x4a/0x60 [drm_kms_helper] > > [ 45.284330] drm_dp_destroy_connector_work+0x43e/0x6f0 [drm_kms_helper] > > [ 45.286660] process_one_work+0x88d/0x15d0 > > [ 45.288934] worker_thread+0x1a5/0x1470 > > [ 45.291231] kthread+0x2f7/0x3b0 > > [ 45.293547] ret_from_fork+0x35/0x40 > > > > [ 45.298206] The buggy address belongs to the object at ffff8882b4b70968 > > which belongs to the cache kmalloc-2k of size 2048 > > [ 45.303047] The buggy address is located 0 bytes inside of > > 2048-byte region [ffff8882b4b70968, ffff8882b4b71168) > > [ 45.308010] The buggy address belongs to the page: > > [ 45.310477] page:ffffea000ad2dc00 count:1 mapcount:0 mapping:ffff8882c080cf40 index:0x0 compound_mapcount: 0 > > [ 45.313051] flags: 0x8000000000010200(slab|head) > > [ 45.315635] raw: 8000000000010200 ffffea000aac2808 ffffea000abe8608 ffff8882c080cf40 > > [ 45.318300] raw: 0000000000000000 00000000000d000d 00000001ffffffff 0000000000000000 > > [ 45.320966] page dumped because: kasan: bad access detected > > > > [ 45.326312] Memory state around the buggy address: > > [ 45.329085] ffff8882b4b70800: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > > [ 45.331845] ffff8882b4b70880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > > [ 45.334584] >ffff8882b4b70900: fc fc fc fc fc fc fc fc fc fc fc fc fc fb fb fb > > [ 45.337302] ^ > > [ 45.340061] ffff8882b4b70980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > [ 45.342910] ffff8882b4b70a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > [ 45.345748] ================================================================== > > > > So, this definitely isn't a fix that we want. This being said; there's > > no real easy fix for this problem because of some of the catch-22's of > > the MST helpers current design. For starters; we always need to validate > > a port with drm_dp_get_validated_port_ref(), but validation relies on > > the lifetime of the port in the actual topology. So once the port is > > gone, it can't be validated again. > > > > If we were to try to make the payload helpers not use port validation, > > then we'd cause another problem: if the port isn't validated, it could > > be freed and we'd just start causing more KASAN issues. There are > > already hacks that attempt to workaround this in > > drm_dp_mst_destroy_connector_work() by re-initializing the kref so that > > it can be used again and it's memory can be freed once the VCPI helpers > > finish removing the port's respective payloads. But none of these really > > do anything helpful since the port still can't be validated since it's > > gone from the topology. Also, that workaround is immensely confusing to > > read through. > > > > What really needs to be done in order to fix this is to teach DRM how to > > track the lifetime of the structs for MST ports and branch devices > > separately from their lifetime in the actual topology. Simply put; this > > means having two different krefs-one that removes the port/branch device > > from the topology, and one that finally calls kfree(). This would let us > > simplify things, since we'd now be able to keep ports around without > > having to keep them in the topology at the same time, which is exactly > > what we need in order to teach our VCPI helpers to only validate ports > > when it's actually necessary without running the risk of trying to use > > unallocated memory. > > > > I was hoping the current architecture could support what we're trying to do. It looked so close. This explains well why it can't and what we need. Thanks. > > > Such a fix is on it's way, but for now let's play it safe and just > > revert this. If this bug has been around for well over a year, we can > > wait a little while to get an actual proper fix here. > > > > Signed-off-by: Lyude Paul <lyude@xxxxxxxxxx> > > Fixes: c54c7374ff44 ("drm/dp_mst: Skip validating ports during destruction, just ref") > > Cc: Daniel Vetter <daniel@xxxxxxxx> > > Cc: Sean Paul <sean@xxxxxxxxxx> > > Cc: Jerry Zuo <Jerry.Zuo@xxxxxxx> > > Cc: Harry Wentland <Harry.Wentland@xxxxxxx> > > Cc: stable@xxxxxxxxxxxxxxx # v4.6+ > > Acked-by: Sean Paul <sean@xxxxxxxxxx> > > Reviewed-by: Harry Wentland <harry.wentland@xxxxxxx> > FTR, Lyude pushed this on Wednesday to drm-misc-fixes and it's in Dave's latest PR for 4.20 Sean > Harry > > > Link: https://patchwork.freedesktop.org/patch/msgid/20181128210005.24434-1-lyude@xxxxxxxxxx > > --- > > drivers/gpu/drm/drm_dp_mst_topology.c | 15 ++------------- > > 1 file changed, 2 insertions(+), 13 deletions(-) > > > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c > > index 250d716..0e0df39 100644 > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c > > @@ -1023,20 +1023,9 @@ static struct drm_dp_mst_port *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_ > > static struct drm_dp_mst_port *drm_dp_get_validated_port_ref(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port) > > { > > struct drm_dp_mst_port *rport = NULL; > > - > > mutex_lock(&mgr->lock); > > - /* > > - * Port may or may not be 'valid' but we don't care about that when > > - * destroying the port and we are guaranteed that the port pointer > > - * will be valid until we've finished > > - */ > > - if (current_work() == &mgr->destroy_connector_work) { > > - kref_get(&port->kref); > > - rport = port; > > - } else if (mgr->mst_primary) { > > - rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, > > - port); > > - } > > + if (mgr->mst_primary) > > + rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, port); > > mutex_unlock(&mgr->lock); > > return rport; > > } > >