> -----Original Message----- > From: Mathias Fröhlich <Mathias.Froehlich@xxxxxxx> > Sent: Friday, April 5, 2019 1:13 AM > To: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> > Cc: linux-kernel@xxxxxxxxxxxxxxx; stable@xxxxxxxxxxxxxxx; Li, Sun peng (Leo) > <Sunpeng.Li@xxxxxxx>; Deucher, Alexander > <Alexander.Deucher@xxxxxxx>; Sasha Levin <sashal@xxxxxxxxxx> > Subject: Re: [PATCH 5.0 072/246] drm/amd/display: Fix reference counting > for struct dc_sink. > > Greg, > > as I mentioned in the commit message, I saw more fixes to that area in Alex > Deuchers queue when I fed that to Alex. There is one fix that I can think of > that interacts with my fixes. Means, we may get unwanted side effects of my > patch without the fix mentioned below. With that below patch also selected, > I think we should be ok for stable. > Alex, AMD people, your opinion? I'm not sure. I haven't gone through to figure out what combinations of all of these patches are required for each stable kernel. The combinatorics are too much which is why I only cc stable on certain patches. Harry or Leo may have be able to comment however. Alex > > The one that I can spot not already in linux-5.0.y is: > > commit 3f01f098a4e2ef30ef628497c43a3d568e720376 > Author: Jerry (Fangzhi) Zuo <Jerry.Zuo@xxxxxxx> > Date: Thu Jan 24 11:46:49 2019 -0500 > > drm/amd/display: Clear dc_sink after it gets released > > [Why] > The dc_sink was released but the pointer on the aconnector was > not cleared. > > [How] > Clear it. > > best > > Mathias > > > On Thursday, 4 April 2019 10:46:12 CEST Greg Kroah-Hartman wrote: > > 5.0-stable review patch. If anyone has any objections, please let me know. > > > > ------------------ > > > > [ Upstream commit dcd5fb82ffb484124203aa339733663ac0b059f3 ] > > > > Reference counting in amdgpu_dm_connector for > > amdgpu_dm_connector::dc_sink and > amdgpu_dm_connector::dc_em_sink as > > well as in dc_link::local_sink seems to be out of shape. Thus make > > reference counting consistent for these members and just plain > > increment the reference count when the variable gets assigned and > decrement when the pointer is set to zero or replaced. > > Also simplify reference counting in selected function sopes to be sure > > the reference is released in any case. In some cases add NULL pointer > > check before dereferencing. > > At a hand full of places a comment is placed to stat that the > > reference increment happened already somewhere else. > > > > This actually fixes the following kernel bug on my system when > > enabling display core in amdgpu. There are some more similar bug > > reports around, so it probably helps at more places. > > > > kernel BUG at mm/slub.c:294! > > invalid opcode: 0000 [#1] SMP PTI > > CPU: 9 PID: 1180 Comm: Xorg Not tainted 5.0.0-rc1+ #2 > > Hardware name: Supermicro X10DAi/X10DAI, BIOS 3.0a 02/05/2018 > > RIP: 0010:__slab_free+0x1e2/0x3d0 > > Code: 8b 54 24 30 48 89 4c 24 28 e8 da fb ff ff 4c 8b 54 24 28 85 c0 0f 85 67 fe > ff ff 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 49 3b 5c 24 28 75 ab 48 > 8b 44 24 30 49 89 4c 24 28 49 89 44 > > RSP: 0018:ffffb0978589fa90 EFLAGS: 00010246 > > RAX: ffff92f12806c400 RBX: 0000000080200019 RCX: ffff92f12806c400 > > RDX: ffff92f12806c400 RSI: ffffdd6421a01a00 RDI: ffff92ed2f406e80 > > RBP: ffffb0978589fb40 R08: 0000000000000001 R09: ffffffffc0ee4748 > > R10: ffff92f12806c400 R11: 0000000000000001 R12: ffffdd6421a01a00 > > R13: ffff92f12806c400 R14: ffff92ed2f406e80 R15: ffffdd6421a01a20 > > FS: 00007f4170be0ac0(0000) GS:ffff92ed2fb40000(0000) > knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 0000562818aaa000 CR3: 000000045745a002 CR4: 00000000003606e0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > Call Trace: > > ? drm_dbg+0x87/0x90 [drm] > > dc_stream_release+0x28/0x50 [amdgpu] > > amdgpu_dm_connector_mode_valid+0xb4/0x1f0 [amdgpu] > > drm_helper_probe_single_connector_modes+0x492/0x6b0 > [drm_kms_helper] > > drm_mode_getconnector+0x457/0x490 [drm] > > ? drm_connector_property_set_ioctl+0x60/0x60 [drm] > > drm_ioctl_kernel+0xa9/0xf0 [drm] > > drm_ioctl+0x201/0x3a0 [drm] > > ? drm_connector_property_set_ioctl+0x60/0x60 [drm] > > amdgpu_drm_ioctl+0x49/0x80 [amdgpu] > > do_vfs_ioctl+0xa4/0x630 > > ? __sys_recvmsg+0x83/0xa0 > > ksys_ioctl+0x60/0x90 > > __x64_sys_ioctl+0x16/0x20 > > do_syscall_64+0x5b/0x160 > > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > RIP: 0033:0x7f417110809b > > Code: 0f 1e fa 48 8b 05 ed bd 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 > 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b > 0d bd bd 0c 00 f7 d8 64 89 01 48 > > RSP: 002b:00007ffdd8d1c268 EFLAGS: 00000246 ORIG_RAX: > 0000000000000010 > > RAX: ffffffffffffffda RBX: 0000562818a8ebc0 RCX: 00007f417110809b > > RDX: 00007ffdd8d1c2a0 RSI: 00000000c05064a7 RDI: 0000000000000012 > > RBP: 00007ffdd8d1c2a0 R08: 0000562819012280 R09: 0000000000000007 > > R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c05064a7 > > R13: 0000000000000012 R14: 0000000000000012 R15: 00007ffdd8d1c2a0 > > Modules linked in: nfsv4 dns_resolver nfs lockd grace fscache fuse > > vfat fat amdgpu intel_rapl sb_edac x86_pkg_temp_thermal > > intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul > > chash gpu_sched crc32_pclmul snd_hda_codec_realtek > ghash_clmulni_intel > > amd_iommu_v2 iTCO_wdt iTCO_vendor_support ttm > snd_hda_codec_generic > > snd_hda_codec_hdmi ledtrig_audio snd_hda_intel drm_kms_helper > > snd_hda_codec intel_cstate snd_hda_core drm snd_hwdep snd_seq > > snd_seq_device intel_uncore snd_pcm intel_rapl_perf snd_timer snd > > soundcore ioatdma pcspkr intel_wmi_thunderbolt mxm_wmi i2c_i801 > > lpc_ich pcc_cpufreq auth_rpcgss sunrpc igb crc32c_intel i2c_algo_bit > > dca wmi hid_cherry analog gameport joydev > > > > This patch is based on agd5f/drm-next-5.1-wip. This patch does not > > require all of that, but agd5f/drm-next-5.1-wip contains at least one > > more dc_sink counting fix that I could spot. > > > > Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@xxxxxx> > > Reviewed-by: Leo Li <sunpeng.li@xxxxxxx> > > Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx> > > Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx> > > --- > > .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 43 > +++++++++++++++---- > > .../display/amdgpu_dm/amdgpu_dm_mst_types.c | 1 + > > drivers/gpu/drm/amd/display/dc/core/dc_link.c | 1 + > > 3 files changed, 37 insertions(+), 8 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > > index 636d14a60952..6d77fd966dbd 100644 > > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > > @@ -886,6 +886,7 @@ static void emulated_link_detect(struct dc_link > *link) > > return; > > } > > > > + /* dc_sink_create returns a new reference */ > > link->local_sink = sink; > > > > edid_status = dm_helpers_read_local_edid( @@ -952,6 +953,8 @@ > static > > int dm_resume(void *handle) > > if (aconnector->fake_enable && aconnector->dc_link- > >local_sink) > > aconnector->fake_enable = false; > > > > + if (aconnector->dc_sink) > > + dc_sink_release(aconnector->dc_sink); > > aconnector->dc_sink = NULL; > > amdgpu_dm_update_connector_after_detect(aconnector); > > mutex_unlock(&aconnector->hpd_lock); > > @@ -1061,6 +1064,8 @@ > amdgpu_dm_update_connector_after_detect(struct > > amdgpu_dm_connector *aconnector) > > > > > > sink = aconnector->dc_link->local_sink; > > + if (sink) > > + dc_sink_retain(sink); > > > > /* > > * Edid mgmt connector gets first update only in mode_valid hook > and > > then @@ -1085,21 +1090,24 @@ > amdgpu_dm_update_connector_after_detect(struct > amdgpu_dm_connector *aconnector) > > * to it anymore after disconnect, so on next > crtc to connector > > * reshuffle by UMD we will get into > unwanted dc_sink release > > */ > > - if (aconnector->dc_sink != aconnector- > >dc_em_sink) > > - dc_sink_release(aconnector- > >dc_sink); > > + dc_sink_release(aconnector->dc_sink); > > } > > aconnector->dc_sink = sink; > > + dc_sink_retain(aconnector->dc_sink); > > amdgpu_dm_update_freesync_caps(connector, > > aconnector->edid); > > } else { > > amdgpu_dm_update_freesync_caps(connector, > NULL); > > - if (!aconnector->dc_sink) > > + if (!aconnector->dc_sink) { > > aconnector->dc_sink = aconnector- > >dc_em_sink; > > - else if (aconnector->dc_sink != aconnector- > >dc_em_sink) > > dc_sink_retain(aconnector->dc_sink); > > + } > > } > > > > mutex_unlock(&dev->mode_config.mutex); > > + > > + if (sink) > > + dc_sink_release(sink); > > return; > > } > > > > @@ -1107,8 +1115,10 @@ > amdgpu_dm_update_connector_after_detect(struct > amdgpu_dm_connector *aconnector) > > * TODO: temporary guard to look for proper fix > > * if this sink is MST sink, we should not do anything > > */ > > - if (sink && sink->sink_signal == SIGNAL_TYPE_DISPLAY_PORT_MST) > > + if (sink && sink->sink_signal == SIGNAL_TYPE_DISPLAY_PORT_MST) { > > + dc_sink_release(sink); > > return; > > + } > > > > if (aconnector->dc_sink == sink) { > > /* > > @@ -1117,6 +1127,8 @@ > amdgpu_dm_update_connector_after_detect(struct > amdgpu_dm_connector *aconnector) > > */ > > DRM_DEBUG_DRIVER("DCHPD: connector_id=%d: dc_sink > didn't change.\n", > > aconnector->connector_id); > > + if (sink) > > + dc_sink_release(sink); > > return; > > } > > > > @@ -1138,6 +1150,7 @@ > amdgpu_dm_update_connector_after_detect(struct > amdgpu_dm_connector *aconnector) > > amdgpu_dm_update_freesync_caps(connector, > NULL); > > > > aconnector->dc_sink = sink; > > + dc_sink_retain(aconnector->dc_sink); > > if (sink->dc_edid.length == 0) { > > aconnector->edid = NULL; > > drm_dp_cec_unset_edid(&aconnector- > >dm_dp_aux.aux); > > @@ -1158,11 +1171,15 @@ > amdgpu_dm_update_connector_after_detect(struct > amdgpu_dm_connector *aconnector) > > amdgpu_dm_update_freesync_caps(connector, NULL); > > drm_connector_update_edid_property(connector, NULL); > > aconnector->num_modes = 0; > > + dc_sink_release(aconnector->dc_sink); > > aconnector->dc_sink = NULL; > > aconnector->edid = NULL; > > } > > > > mutex_unlock(&dev->mode_config.mutex); > > + > > + if (sink) > > + dc_sink_release(sink); > > } > > > > static void handle_hpd_irq(void *param) @@ -2908,6 +2925,7 @@ > > create_stream_for_sink(struct amdgpu_dm_connector *aconnector, > > } > > } else { > > sink = aconnector->dc_sink; > > + dc_sink_retain(sink); > > } > > > > stream = dc_create_stream_for_sink(sink); @@ -2974,8 +2992,7 @@ > > create_stream_for_sink(struct amdgpu_dm_connector *aconnector, > > stream->ignore_msa_timing_param = true; > > > > finish: > > - if (sink && sink->sink_signal == SIGNAL_TYPE_VIRTUAL && > aconnector->base.force != DRM_FORCE_ON) > > - dc_sink_release(sink); > > + dc_sink_release(sink); > > > > return stream; > > } > > @@ -3233,6 +3250,14 @@ static void > amdgpu_dm_connector_destroy(struct drm_connector *connector) > > dm->backlight_dev = NULL; > > } > > #endif > > + > > + if (aconnector->dc_em_sink) > > + dc_sink_release(aconnector->dc_em_sink); > > + aconnector->dc_em_sink = NULL; > > + if (aconnector->dc_sink) > > + dc_sink_release(aconnector->dc_sink); > > + aconnector->dc_sink = NULL; > > + > > drm_dp_cec_unregister_connector(&aconnector- > >dm_dp_aux.aux); > > drm_connector_unregister(connector); > > drm_connector_cleanup(connector); > > @@ -3330,10 +3355,12 @@ static void create_eml_sink(struct > amdgpu_dm_connector *aconnector) > > (edid->extensions + 1) * EDID_LENGTH, > > &init_params); > > > > - if (aconnector->base.force == DRM_FORCE_ON) > > + if (aconnector->base.force == DRM_FORCE_ON) { > > aconnector->dc_sink = aconnector->dc_link->local_sink ? > > aconnector->dc_link->local_sink : > > aconnector->dc_em_sink; > > + dc_sink_retain(aconnector->dc_sink); > > + } > > } > > > > static void handle_edid_mgmt(struct amdgpu_dm_connector > *aconnector) > > diff --git > > a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c > > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c > > index 1b0d209d8367..3b95a637b508 100644 > > --- > a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c > > +++ > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c > > @@ -239,6 +239,7 @@ static int dm_dp_mst_get_modes(struct > drm_connector *connector) > > &init_params); > > > > dc_sink->priv = aconnector; > > + /* dc_link_add_remote_sink returns a new reference */ > > aconnector->dc_sink = dc_sink; > > > > if (aconnector->dc_sink) > > diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c > > b/drivers/gpu/drm/amd/display/dc/core/dc_link.c > > index b0265dbebd4c..583eb367850f 100644 > > --- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c > > +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c > > @@ -792,6 +792,7 @@ bool dc_link_detect(struct dc_link *link, enum > dc_detect_reason reason) > > sink->dongle_max_pix_clk = > sink_caps.max_hdmi_pixel_clock; > > sink->converter_disable_audio = converter_disable_audio; > > > > + /* dc_sink_create returns a new reference */ > > link->local_sink = sink; > > > > edid_status = dm_helpers_read_local_edid( > > > > >