Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Evan,

yeah, exactly that's what this warning should prevent. Allocating buffers temporary for stuff like that is illegal during resume.

I strongly suggest to just remove the MES test. It's abusing the kernel ring interface in a way we didn't want anyway and is currently replaced by Shahanks work.

Regards,
Christian.

Am 10.02.23 um 05:12 schrieb Quan, Evan:

[AMD Official Use Only - General]

 

Hi Jack,

 

Are you trying to fix the call trace popped up on resuming below?

It seems mes created some bo for its self test and freed it up later at the final stage of the resuming process.

All these happened before the in_suspend flag cleared. And that triggered the call trace.

Is my understanding correct?

 

[74084.799260] WARNING: CPU: 2 PID: 2891 at drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425 amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]

[74084.811019] Modules linked in: nls_iso8859_1 amdgpu(OE) iommu_v2 gpu_sched drm_buddy drm_ttm_helper ttm drm_display_helper drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt snd_sm

[74084.811042]  ip_tables x_tables autofs4 hid_logitech_hidpp hid_logitech_dj hid_generic e1000e usbhid ptp uas hid video i2c_i801 ahci pps_core crc32_pclmul i2c_smbus usb_storage libahci wmi

[74084.914519] CPU: 2 PID: 2891 Comm: kworker/u16:38 Tainted: G        W IOE      6.0.0-custom #1

[74084.923146] Hardware name: ASUS System Product Name/PRIME Z390-A, BIOS 2004 11/02/2021

[74084.931074] Workqueue: events_unbound async_run_entry_fn

[74084.936393] RIP: 0010:amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]

[74084.942422] Code: 00 4d 85 ed 74 08 49 c7 45 00 00 00 00 00 4d 85 e4 74 08 49 c7 04 24 00 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc <0f> 0b e9 39 ff ff ff 3d 00 fe ff ff 0f 85 75 96 47 00 ebf

[74084.961199] RSP: 0000:ffffbed6812ebb90 EFLAGS: 00010202

[74084.966435] RAX: 0000000000000000 RBX: ffffbed6812ebc50 RCX: 0000000000000000

[74084.973578] RDX: ffffbed6812ebc70 RSI: ffffbed6812ebc60 RDI: ffffbed6812ebc50

[74084.980725] RBP: ffffbed6812ebbb8 R08: 0000000000000000 R09: 00000000000001ff

[74084.987869] R10: ffffbed6812ebb40 R11: 0000000000000000 R12: ffffbed6812ebc70

[74084.995015] R13: ffffbed6812ebc60 R14: ffff963a2945cc00 R15: ffff9639c7da5630

[74085.002160] FS:  0000000000000000(0000) GS:ffff963d1dc80000(0000) knlGS:0000000000000000

[74085.010262] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[74085.016016] CR2: 0000000000000000 CR3: 0000000377c0a001 CR4: 00000000003706e0

[74085.023164] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

[74085.030307] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

[74085.037453] Call Trace:

[74085.039911]  <TASK>

[74085.042023]  amdgpu_mes_self_test+0x385/0x460 [amdgpu]

[74085.047293]  mes_v11_0_late_init+0x44/0x50 [amdgpu]

[74085.052291]  amdgpu_device_ip_late_init+0x50/0x270 [amdgpu]

[74085.058032]  amdgpu_device_resume+0xb0/0x2d0 [amdgpu]

[74085.063187]  amdgpu_pmops_resume+0x37/0x70 [amdgpu]

[74085.068162]  pci_pm_resume+0x68/0x100

[74085.071836]  ? pci_legacy_resume+0x80/0x80

[74085.075943]  dpm_run_callback+0x4c/0x160

[74085.079873]  device_resume+0xad/0x210

[74085.083546]  async_resume+0x1e/0x40

[74085.087046]  async_run_entry_fn+0x30/0x120

[74085.091152]  process_one_work+0x21a/0x3f0

[74085.095173]  worker_thread+0x50/0x3e0

[74085.098845]  ? process_one_work+0x3f0/0x3f0

[74085.103039]  kthread+0xfa/0x130

[74085.106189]  ? kthread_complete_and_exit+0x20/0x20

[74085.110993]  ret_from_fork+0x1f/0x30

[74085.114576]  </TASK>

[74085.116773] ---[ end trace 0000000000000000 ]---

 

BR

Evan

From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Christian König
Sent: Monday, February 6, 2023 5:00 PM
To: Xiao, Jack <Jack.Xiao@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Deucher, Alexander <Alexander.Deucher@xxxxxxx>
Subject: Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

 

Am 06.02.23 um 09:28 schrieb Xiao, Jack:

[AMD Official Use Only - General]

 

                              >> >> It's simply not allowed to free up resources during suspend since those can't be acquired again during resume.

                              >> The in_suspend flag is set at the beginning of suspend and unset at the end of resume. It can’t filter the case you mentioned.


               Why not? This is exactly what it should do.

 

[Jack] If freeing up resources during resume, it should not hit the issue you described. But only checking in_suspend flag would take these cases as warning.


No, once more: Freeing up or allocating resources between suspend and resume is illegal!

If you free up a resource during resume you should absolutely hit that, this is intentional!

Regards,
Christian.

 

Regards,

Jack

 

From: Koenig, Christian <Christian.Koenig@xxxxxxx>
Sent: Monday, February 6, 2023 4:06 PM
To: Xiao, Jack <Jack.Xiao@xxxxxxx>; Christian König <ckoenig.leichtzumerken@xxxxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Deucher, Alexander <Alexander.Deucher@xxxxxxx>
Subject: Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

 

Am 06.02.23 um 08:23 schrieb Xiao, Jack:

[AMD Official Use Only - General]

 

>> Nope, that is not related to any hw state.

 

can use other flag.

 

>> It's simply not allowed to free up resources during suspend since those can't be acquired again during resume.

The in_suspend flag is set at the beginning of suspend and unset at the end of resume. It can’t filter the case you mentioned.


Why not? This is exactly what it should do.

Do you know the root cause of these cases hitting the issue? So that we can get an exact point to warn the freeing up behavior.


Well the root cause are programming errors. See between suspending and resuming you should not allocate nor free memory.

Otherwise we can run into trouble. And this check here is one part of that, we should probably add another warning during allocation of memory. But this here is certainly correct.

Regards,
Christian.

 

Thanks,

Jack

 

From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx>
Sent: Friday, February 3, 2023 9:20 PM
To: Xiao, Jack <Jack.Xiao@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Deucher, Alexander <Alexander.Deucher@xxxxxxx>
Subject: Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

 

Nope, that is not related to any hw state.

It's simply not allowed to free up resources during suspend since those can't be acquired again during resume.

We had a couple of cases now where this was wrong. If you get a warning from that please fix the code which tried to free something during suspend instead.

Regards,
Christian.

Am 03.02.23 um 07:04 schrieb Xiao, Jack:

[AMD Official Use Only - General]

 

>> It's simply illegal to free up memory during suspend.

Why? In my understanding, the limit was caused by DMA shutdown.

 

Regards,

Jack

 

From: Koenig, Christian <Christian.Koenig@xxxxxxx>
Sent: Thursday, February 2, 2023 7:43 PM
To: Xiao, Jack <Jack.Xiao@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Deucher, Alexander <Alexander.Deucher@xxxxxxx>
Subject: AW: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

 

Big NAK to this! This warning is not related in any way to the hw state.

 

It's simply illegal to free up memory during suspend.

 

Regards,

Christian.

 


Von: Xiao, Jack <Jack.Xiao@xxxxxxx>
Gesendet: Donnerstag, 2. Februar 2023 10:54
An: amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>
Cc: Xiao, Jack <Jack.Xiao@xxxxxxx>
Betreff: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

 

Reduce waringings, only warn when DMA is unavailable.

Signed-off-by: Jack Xiao <Jack.Xiao@xxxxxxx>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 2d237f3d3a2e..e3e3764ea697 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -422,7 +422,8 @@ void amdgpu_bo_free_kernel(struct amdgpu_bo **bo, u64 *gpu_addr,
         if (*bo == NULL)
                 return;
 
-       WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend);
+       WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend &&
+               !amdgpu_ttm_adev((*bo)->tbo.bdev)->ip_blocks[AMD_IP_BLOCK_TYPE_SDMA].status.hw);
 
         if (likely(amdgpu_bo_reserve(*bo, true) == 0)) {
                 if (cpu_addr)
--
2.37.3

 

 

 



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux