Re: [PATCH 1/3] Revert "drm/dp_mst: Skip validating ports during destruction, just ref"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 29, 2018 at 9:46 AM Wentland, Harry <Harry.Wentland@xxxxxxx> wrote:
>
> On 2018-11-28 7:31 p.m., Guang Bai wrote:
> > From: Lyude Paul <lyude@xxxxxxxxxx>
> >
> > This reverts commit:
> >
> > c54c7374ff44 ("drm/dp_mst: Skip validating ports during destruction, just ref")
> >
> > ugh.
> >
> > In drm_dp_destroy_connector_work(), we have a pretty good chance of
> > freeing the actual struct drm_dp_mst_port. However, after destroying
> > things we send a hotplug through (*mgr->cbs->hotplug)(mgr) which is
> > where the problems start.
> >
> > For i915, this calls all the way down to the fbcon probing helpers,
> > which start trying to access the port in a modeset.
> >
> > [   45.062001] ==================================================================
> > [   45.062112] BUG: KASAN: use-after-free in ex_handler_refcount+0x146/0x180
> > [   45.062196] Write of size 4 at addr ffff8882b4b70968 by task kworker/3:1/53
> >
> > [   45.062325] CPU: 3 PID: 53 Comm: kworker/3:1 Kdump: loaded Tainted: G           O      4.20.0-rc4Lyude-Test+ #3
> > [   45.062442] Hardware name: LENOVO 20BWS1KY00/20BWS1KY00, BIOS JBET71WW (1.35 ) 09/14/2018
> > [   45.062554] Workqueue: events drm_dp_destroy_connector_work [drm_kms_helper]
> > [   45.062641] Call Trace:
> > [   45.062685]  dump_stack+0xbd/0x15a
> > [   45.062735]  ? dump_stack_print_info.cold.0+0x1b/0x1b
> > [   45.062801]  ? printk+0x9f/0xc5
> > [   45.062847]  ? kmsg_dump_rewind_nolock+0xe4/0xe4
> > [   45.062909]  ? ex_handler_refcount+0x146/0x180
> > [   45.062970]  print_address_description+0x71/0x239
> > [   45.063036]  ? ex_handler_refcount+0x146/0x180
> > [   45.063095]  kasan_report.cold.5+0x242/0x30b
> > [   45.063155]  __asan_report_store4_noabort+0x1c/0x20
> > [   45.063313]  ex_handler_refcount+0x146/0x180
> > [   45.063371]  ? ex_handler_clear_fs+0xb0/0xb0
> > [   45.063428]  fixup_exception+0x98/0xd7
> > [   45.063484]  ? raw_notifier_call_chain+0x20/0x20
> > [   45.063548]  do_trap+0x6d/0x210
> > [   45.063605]  ? _GLOBAL__sub_I_65535_1_drm_dp_aux_unregister_devnode+0x2f/0x1c6 [drm_kms_helper]
> > [   45.063732]  do_error_trap+0xc0/0x170
> > [   45.063802]  ? _GLOBAL__sub_I_65535_1_drm_dp_aux_unregister_devnode+0x2f/0x1c6 [drm_kms_helper]
> > [   45.063929]  do_invalid_op+0x3b/0x50
> > [   45.063997]  ? _GLOBAL__sub_I_65535_1_drm_dp_aux_unregister_devnode+0x2f/0x1c6 [drm_kms_helper]
> > [   45.064103]  invalid_op+0x14/0x20
> > [   45.064162] RIP: 0010:_GLOBAL__sub_I_65535_1_drm_dp_aux_unregister_devnode+0x2f/0x1c6 [drm_kms_helper]
> > [   45.064274] Code: 00 48 c7 c7 80 fe 53 a0 48 89 e5 e8 5b 6f 26 e1 5d c3 48 8d 0e 0f 0b 48 8d 0b 0f 0b 48 8d 0f 0f 0b 48 8d 0f 0f 0b 49 8d 4d 00 <0f> 0b 49 8d 0e 0f 0b 48 8d 08 0f 0b 49 8d 4d 00 0f 0b 48 8d 0b 0f
> > [   45.064569] RSP: 0018:ffff8882b789ee10 EFLAGS: 00010282
> > [   45.064637] RAX: ffff8882af47ae70 RBX: ffff8882af47aa60 RCX: ffff8882b4b70968
> > [   45.064723] RDX: ffff8882af47ae70 RSI: 0000000000000008 RDI: ffff8882b788bdb8
> > [   45.064808] RBP: ffff8882b789ee28 R08: ffffed1056f13db4 R09: ffffed1056f13db3
> > [   45.064894] R10: ffffed1056f13db3 R11: ffff8882b789ed9f R12: ffff8882af47ad28
> > [   45.064980] R13: ffff8882b4b70968 R14: ffff8882acd86728 R15: ffff8882b4b75dc8
> > [   45.065084]  drm_dp_mst_reset_vcpi_slots+0x12/0x80 [drm_kms_helper]
> > [   45.065225]  intel_mst_disable_dp+0xda/0x180 [i915]
> > [   45.065361]  intel_encoders_disable.isra.107+0x197/0x310 [i915]
> > [   45.065498]  haswell_crtc_disable+0xbe/0x400 [i915]
> > [   45.065622]  ? i9xx_disable_plane+0x1c0/0x3e0 [i915]
> > [   45.065750]  intel_atomic_commit_tail+0x74e/0x3e60 [i915]
> > [   45.065884]  ? intel_pre_plane_update+0xbc0/0xbc0 [i915]
> > [   45.065968]  ? drm_atomic_helper_swap_state+0x88b/0x1d90 [drm_kms_helper]
> > [   45.066054]  ? kasan_check_write+0x14/0x20
> > [   45.066165]  ? i915_gem_track_fb+0x13a/0x330 [i915]
> > [   45.066277]  ? i915_sw_fence_complete+0xe9/0x140 [i915]
> > [   45.066406]  ? __i915_sw_fence_complete+0xc50/0xc50 [i915]
> > [   45.066540]  intel_atomic_commit+0x72e/0xef0 [i915]
> > [   45.066635]  ? drm_dev_dbg+0x200/0x200 [drm]
> > [   45.066764]  ? intel_atomic_commit_tail+0x3e60/0x3e60 [i915]
> > [   45.066898]  ? intel_atomic_commit_tail+0x3e60/0x3e60 [i915]
> > [   45.067001]  drm_atomic_commit+0xc4/0xf0 [drm]
> > [   45.067074]  restore_fbdev_mode_atomic+0x562/0x780 [drm_kms_helper]
> > [   45.067166]  ? drm_fb_helper_debug_leave+0x690/0x690 [drm_kms_helper]
> > [   45.067249]  ? kasan_check_read+0x11/0x20
> > [   45.067324]  restore_fbdev_mode+0x127/0x4b0 [drm_kms_helper]
> > [   45.067364]  ? kasan_check_read+0x11/0x20
> > [   45.067406]  drm_fb_helper_restore_fbdev_mode_unlocked+0x164/0x200 [drm_kms_helper]
> > [   45.067462]  ? drm_fb_helper_hotplug_event+0x30/0x30 [drm_kms_helper]
> > [   45.067508]  ? kasan_check_write+0x14/0x20
> > [   45.070360]  ? mutex_unlock+0x22/0x40
> > [   45.073748]  drm_fb_helper_set_par+0xb2/0xf0 [drm_kms_helper]
> > [   45.075846]  drm_fb_helper_hotplug_event.part.33+0x1cd/0x290 [drm_kms_helper]
> > [   45.078088]  drm_fb_helper_hotplug_event+0x1c/0x30 [drm_kms_helper]
> > [   45.082614]  intel_fbdev_output_poll_changed+0x9f/0x140 [i915]
> > [   45.087069]  drm_kms_helper_hotplug_event+0x67/0x90 [drm_kms_helper]
> > [   45.089319]  intel_dp_mst_hotplug+0x37/0x50 [i915]
> > [   45.091496]  drm_dp_destroy_connector_work+0x510/0x6f0 [drm_kms_helper]
> > [   45.093675]  ? drm_dp_update_payload_part1+0x1220/0x1220 [drm_kms_helper]
> > [   45.095851]  ? kasan_check_write+0x14/0x20
> > [   45.098473]  ? kasan_check_read+0x11/0x20
> > [   45.101155]  ? strscpy+0x17c/0x530
> > [   45.103808]  ? __switch_to_asm+0x34/0x70
> > [   45.106456]  ? syscall_return_via_sysret+0xf/0x7f
> > [   45.109711]  ? read_word_at_a_time+0x20/0x20
> > [   45.113138]  ? __switch_to_asm+0x40/0x70
> > [   45.116529]  ? __switch_to_asm+0x34/0x70
> > [   45.119891]  ? __switch_to_asm+0x40/0x70
> > [   45.123224]  ? __switch_to_asm+0x34/0x70
> > [   45.126540]  ? __switch_to_asm+0x34/0x70
> > [   45.129824]  process_one_work+0x88d/0x15d0
> > [   45.133172]  ? pool_mayday_timeout+0x850/0x850
> > [   45.136459]  ? pci_mmcfg_check_reserved+0x110/0x128
> > [   45.139739]  ? wake_q_add+0xb0/0xb0
> > [   45.143010]  ? check_preempt_wakeup+0x652/0x1050
> > [   45.146304]  ? worker_enter_idle+0x29e/0x740
> > [   45.149589]  ? __schedule+0x1ec0/0x1ec0
> > [   45.152937]  ? kasan_check_read+0x11/0x20
> > [   45.156179]  ? _raw_spin_lock_irq+0xa3/0x130
> > [   45.159382]  ? _raw_read_unlock_irqrestore+0x30/0x30
> > [   45.162542]  ? kasan_check_write+0x14/0x20
> > [   45.165657]  worker_thread+0x1a5/0x1470
> > [   45.168725]  ? set_load_weight+0x2e0/0x2e0
> > [   45.171755]  ? process_one_work+0x15d0/0x15d0
> > [   45.174806]  ? __switch_to_asm+0x34/0x70
> > [   45.177645]  ? __switch_to_asm+0x40/0x70
> > [   45.180323]  ? __switch_to_asm+0x34/0x70
> > [   45.182936]  ? __switch_to_asm+0x40/0x70
> > [   45.185539]  ? __switch_to_asm+0x34/0x70
> > [   45.188100]  ? __switch_to_asm+0x40/0x70
> > [   45.190628]  ? __schedule+0x7d4/0x1ec0
> > [   45.193143]  ? save_stack+0xa9/0xd0
> > [   45.195632]  ? kasan_check_write+0x10/0x20
> > [   45.198162]  ? kasan_kmalloc+0xc4/0xe0
> > [   45.200609]  ? kmem_cache_alloc_trace+0xdd/0x190
> > [   45.203046]  ? kthread+0x9f/0x3b0
> > [   45.205470]  ? ret_from_fork+0x35/0x40
> > [   45.207876]  ? unwind_next_frame+0x43/0x50
> > [   45.210273]  ? __save_stack_trace+0x82/0x100
> > [   45.212658]  ? deactivate_slab.isra.67+0x3d4/0x580
> > [   45.215026]  ? default_wake_function+0x35/0x50
> > [   45.217399]  ? kasan_check_read+0x11/0x20
> > [   45.219825]  ? _raw_spin_lock_irqsave+0xae/0x140
> > [   45.222174]  ? __lock_text_start+0x8/0x8
> > [   45.224521]  ? replenish_dl_entity.cold.62+0x4f/0x4f
> > [   45.226868]  ? __kthread_parkme+0x87/0xf0
> > [   45.229200]  kthread+0x2f7/0x3b0
> > [   45.231557]  ? process_one_work+0x15d0/0x15d0
> > [   45.233923]  ? kthread_park+0x120/0x120
> > [   45.236249]  ret_from_fork+0x35/0x40
> >
> > [   45.240875] Allocated by task 242:
> > [   45.243136]  save_stack+0x43/0xd0
> > [   45.245385]  kasan_kmalloc+0xc4/0xe0
> > [   45.247597]  kmem_cache_alloc_trace+0xdd/0x190
> > [   45.249793]  drm_dp_add_port+0x1e0/0x2170 [drm_kms_helper]
> > [   45.252000]  drm_dp_send_link_address+0x4a7/0x740 [drm_kms_helper]
> > [   45.254389]  drm_dp_check_and_send_link_address+0x1a7/0x210 [drm_kms_helper]
> > [   45.256803]  drm_dp_mst_link_probe_work+0x6f/0xb0 [drm_kms_helper]
> > [   45.259200]  process_one_work+0x88d/0x15d0
> > [   45.261597]  worker_thread+0x1a5/0x1470
> > [   45.264038]  kthread+0x2f7/0x3b0
> > [   45.266371]  ret_from_fork+0x35/0x40
> >
> > [   45.270937] Freed by task 53:
> > [   45.273170]  save_stack+0x43/0xd0
> > [   45.275382]  __kasan_slab_free+0x139/0x190
> > [   45.277604]  kasan_slab_free+0xe/0x10
> > [   45.279826]  kfree+0x99/0x1b0
> > [   45.282044]  drm_dp_free_mst_port+0x4a/0x60 [drm_kms_helper]
> > [   45.284330]  drm_dp_destroy_connector_work+0x43e/0x6f0 [drm_kms_helper]
> > [   45.286660]  process_one_work+0x88d/0x15d0
> > [   45.288934]  worker_thread+0x1a5/0x1470
> > [   45.291231]  kthread+0x2f7/0x3b0
> > [   45.293547]  ret_from_fork+0x35/0x40
> >
> > [   45.298206] The buggy address belongs to the object at ffff8882b4b70968
> >                 which belongs to the cache kmalloc-2k of size 2048
> > [   45.303047] The buggy address is located 0 bytes inside of
> >                 2048-byte region [ffff8882b4b70968, ffff8882b4b71168)
> > [   45.308010] The buggy address belongs to the page:
> > [   45.310477] page:ffffea000ad2dc00 count:1 mapcount:0 mapping:ffff8882c080cf40 index:0x0 compound_mapcount: 0
> > [   45.313051] flags: 0x8000000000010200(slab|head)
> > [   45.315635] raw: 8000000000010200 ffffea000aac2808 ffffea000abe8608 ffff8882c080cf40
> > [   45.318300] raw: 0000000000000000 00000000000d000d 00000001ffffffff 0000000000000000
> > [   45.320966] page dumped because: kasan: bad access detected
> >
> > [   45.326312] Memory state around the buggy address:
> > [   45.329085]  ffff8882b4b70800: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> > [   45.331845]  ffff8882b4b70880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> > [   45.334584] >ffff8882b4b70900: fc fc fc fc fc fc fc fc fc fc fc fc fc fb fb fb
> > [   45.337302]                                                           ^
> > [   45.340061]  ffff8882b4b70980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > [   45.342910]  ffff8882b4b70a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > [   45.345748] ==================================================================
> >
> > So, this definitely isn't a fix that we want. This being said; there's
> > no real easy fix for this problem because of some of the catch-22's of
> > the MST helpers current design. For starters; we always need to validate
> > a port with drm_dp_get_validated_port_ref(), but validation relies on
> > the lifetime of the port in the actual topology. So once the port is
> > gone, it can't be validated again.
> >
> > If we were to try to make the payload helpers not use port validation,
> > then we'd cause another problem: if the port isn't validated, it could
> > be freed and we'd just start causing more KASAN issues. There are
> > already hacks that attempt to workaround this in
> > drm_dp_mst_destroy_connector_work() by re-initializing the kref so that
> > it can be used again and it's memory can be freed once the VCPI helpers
> > finish removing the port's respective payloads. But none of these really
> > do anything helpful since the port still can't be validated since it's
> > gone from the topology. Also, that workaround is immensely confusing to
> > read through.
> >
> > What really needs to be done in order to fix this is to teach DRM how to
> > track the lifetime of the structs for MST ports and branch devices
> > separately from their lifetime in the actual topology. Simply put; this
> > means having two different krefs-one that removes the port/branch device
> > from the topology, and one that finally calls kfree(). This would let us
> > simplify things, since we'd now be able to keep ports around without
> > having to keep them in the topology at the same time, which is exactly
> > what we need in order to teach our VCPI helpers to only validate ports
> > when it's actually necessary without running the risk of trying to use
> > unallocated memory.
> >
>
> I was hoping the current architecture could support what we're trying to do. It looked so close. This explains well why it can't and what we need. Thanks.
>
> > Such a fix is on it's way, but for now let's play it safe and just
> > revert this. If this bug has been around for well over a year, we can
> > wait a little while to get an actual proper fix here.
> >
> > Signed-off-by: Lyude Paul <lyude@xxxxxxxxxx>
> > Fixes: c54c7374ff44 ("drm/dp_mst: Skip validating ports during destruction, just ref")
> > Cc: Daniel Vetter <daniel@xxxxxxxx>
> > Cc: Sean Paul <sean@xxxxxxxxxx>
> > Cc: Jerry Zuo <Jerry.Zuo@xxxxxxx>
> > Cc: Harry Wentland <Harry.Wentland@xxxxxxx>
> > Cc: stable@xxxxxxxxxxxxxxx # v4.6+
> > Acked-by: Sean Paul <sean@xxxxxxxxxx>
>
> Reviewed-by: Harry Wentland <harry.wentland@xxxxxxx>
>

FTR, Lyude pushed this on Wednesday to drm-misc-fixes and it's in
Dave's latest PR for 4.20

Sean


> Harry
>
> > Link: https://patchwork.freedesktop.org/patch/msgid/20181128210005.24434-1-lyude@xxxxxxxxxx
> > ---
> >  drivers/gpu/drm/drm_dp_mst_topology.c | 15 ++-------------
> >  1 file changed, 2 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
> > index 250d716..0e0df39 100644
> > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > @@ -1023,20 +1023,9 @@ static struct drm_dp_mst_port *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_
> >  static struct drm_dp_mst_port *drm_dp_get_validated_port_ref(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port)
> >  {
> >       struct drm_dp_mst_port *rport = NULL;
> > -
> >       mutex_lock(&mgr->lock);
> > -     /*
> > -      * Port may or may not be 'valid' but we don't care about that when
> > -      * destroying the port and we are guaranteed that the port pointer
> > -      * will be valid until we've finished
> > -      */
> > -     if (current_work() == &mgr->destroy_connector_work) {
> > -             kref_get(&port->kref);
> > -             rport = port;
> > -     } else if (mgr->mst_primary) {
> > -             rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary,
> > -                                                    port);
> > -     }
> > +     if (mgr->mst_primary)
> > +             rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, port);
> >       mutex_unlock(&mgr->lock);
> >       return rport;
> >  }
> >



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux