Unfortunately this does not work either: [ 0.859998] ------------[ cut here ]------------ [ 0.859998] trying to bind memory to uninitialized GART ! [ 0.860003] WARNING: CPU: 13 PID: 235 at drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c:254 amdgpu_gart_bind+0x29/0x40 [amdgpu] [ 0.860099] Modules linked in: amdgpu(+) drm_ttm_helper ttm gpu_sched i2c_algo_bit drm_kms_helper syscopyarea hid_sensor_hub sysfillrect mfd_core sysimgblt hid_generic fb_sys_fops cec xhci_pci xhci_hcd nvme drm r8169 nvme_core psmouse crc32c_intel realtek amd_sfh usbcore i2c_hid_acpi mdio_devres t10_pi crc_t10dif i2c_hid i2c_piix4 crct10dif_generic libphy crct10dif_common hid backlight i2c_designware_platform i2c_designware_core [ 0.860113] CPU: 13 PID: 235 Comm: systemd-udevd Not tainted 5.13.0+ #15 [ 0.860115] Hardware name: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-158L, BIOS E158LAMS.107 11/10/2021 [ 0.860116] RIP: 0010:amdgpu_gart_bind+0x29/0x40 [amdgpu] [ 0.860210] Code: 00 80 bf 34 25 00 00 00 74 14 4c 8b 8f 20 25 00 00 4d 85 c9 74 05 e9 16 ff ff ff 31 c0 c3 48 c7 c7 08 06 7d c0 e8 8e cc 31 e2 <0f> 0b b8 ea ff ff ff c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 [ 0.860212] RSP: 0018:ffffbb9e80b6f968 EFLAGS: 00010286 [ 0.860213] RAX: 0000000000000000 RBX: 0000000000000067 RCX: ffffffffa3080968 [ 0.860214] RDX: 0000000000000000 RSI: 00000000ffffefff RDI: ffffffffa3028960 [ 0.860215] RBP: ffff947c91e49a80 R08: 0000000000000000 R09: ffffbb9e80b6f798 [ 0.860215] R10: ffffbb9e80b6f790 R11: ffffffffa30989a8 R12: 0000000000000000 [ 0.860216] R13: ffff947c8a740000 R14: ffff947c8a740000 R15: 0000000000000000 [ 0.860216] FS: 00007f60a3c918c0(0000) GS:ffff947f5e940000(0000) knlGS:0000000000000000 [ 0.860217] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.860218] CR2: 00007f60a4213480 CR3: 0000000135ee2000 CR4: 0000000000550ee0 [ 0.860218] PKRU: 55555554 [ 0.860219] Call Trace: [ 0.860221] amdgpu_ttm_gart_bind+0x74/0xc0 [amdgpu] [ 0.860305] amdgpu_ttm_alloc_gart+0x13e/0x190 [amdgpu] [ 0.860385] amdgpu_bo_create_reserved.part.0+0xf3/0x1b0 [amdgpu] [ 0.860465] ? amdgpu_ttm_debugfs_init+0x110/0x110 [amdgpu] [ 0.860554] amdgpu_bo_create_kernel+0x36/0xa0 [amdgpu] [ 0.860641] amdgpu_ttm_init.cold+0x167/0x181 [amdgpu] [ 0.860784] gmc_v10_0_sw_init+0x2d7/0x430 [amdgpu] [ 0.860889] amdgpu_device_init.cold+0x147f/0x1ad7 [amdgpu] [ 0.861007] ? acpi_ns_get_node+0x4a/0x55 [ 0.861011] ? acpi_get_handle+0x89/0xb2 [ 0.861012] amdgpu_driver_load_kms+0x55/0x290 [amdgpu] [ 0.861098] amdgpu_pci_probe+0x181/0x250 [amdgpu] [ 0.861188] pci_device_probe+0xcd/0x140 [ 0.861191] really_probe+0xed/0x460 [ 0.861193] driver_probe_device+0xe3/0x150 [ 0.861195] device_driver_attach+0x9c/0xb0 [ 0.861196] __driver_attach+0x8a/0x150 [ 0.861197] ? device_driver_attach+0xb0/0xb0 [ 0.861198] ? device_driver_attach+0xb0/0xb0 [ 0.861198] bus_for_each_dev+0x73/0xb0 [ 0.861200] bus_add_driver+0x121/0x1e0 [ 0.861201] driver_register+0x8a/0xe0 [ 0.861202] ? 0xffffffffc1117000 [ 0.861203] do_one_initcall+0x47/0x180 [ 0.861205] ? do_init_module+0x19/0x230 [ 0.861208] ? kmem_cache_alloc+0x182/0x260 [ 0.861210] do_init_module+0x51/0x230 [ 0.861211] __do_sys_finit_module+0xb1/0x110 [ 0.861213] do_syscall_64+0x40/0xb0 [ 0.861216] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 0.861218] RIP: 0033:0x7f60a4149679 [ 0.861220] Code: 48 8d 3d 9a a1 0c 00 0f 05 eb a5 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c7 57 0c 00 f7 d8 64 89 01 48 [ 0.861221] RSP: 002b:00007ffe25f17ea8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 0.861223] RAX: ffffffffffffffda RBX: 000056004a10a660 RCX: 00007f60a4149679 [ 0.861224] RDX: 0000000000000000 RSI: 00007f60a42e9eed RDI: 0000000000000016 [ 0.861224] RBP: 0000000000020000 R08: 0000000000000000 R09: 000056004a105980 [ 0.861225] R10: 0000000000000016 R11: 0000000000000246 R12: 00007f60a42e9eed [ 0.861225] R13: 0000000000000000 R14: 000056004a0efdd0 R15: 000056004a10a660 [ 0.861226] ---[ end trace 0319f26df48f8ef0 ]--- [ 0.861228] [drm:amdgpu_ttm_gart_bind [amdgpu]] *ERROR* failed to bind 1 pages at 0x00400000 [ 0.861540] amdgpu 0000:03:00.0: amdgpu: 00000000a9dfe17c bind failed Am Mittwoch, dem 19.01.2022 um 19:54 -0500 schrieb Alex Deucher: > On Wed, Jan 19, 2022 at 7:48 PM Bert Karwatzki <spasswolf@xxxxxx> > wrote: > > > > Bisected the error and found the first bad commit to be > > d015e9861e55928a78137a2c95897bc50637fc47 is the first bad commit > > commit d015e9861e55928a78137a2c95897bc50637fc47 > > Author: Jonathan Kim <jonathan.kim@xxxxxxx> > > Date: Thu Dec 9 16:48:56 2021 -0500 > > > > drm/amdgpu: improve debug VRAM access performance using sdma > > > > For better performance during VRAM access for debugged > > processes, > > do > > read/write copies over SDMA. > > > > In order to fulfill post mortem debugging on a broken device, > > fallback to > > stable MMIO access when gpu recovery is disabled or when job > > submission > > time outs are set to max. Failed SDMA access should > > automatically > > fall > > back to MMIO access. > > > > Use a pre-allocated GTT bounce buffer pre-mapped into GART to > > avoid > > page-table updates and TLB flushes on access. > > > > Signed-off-by: Jonathan Kim <jonathan.kim@xxxxxxx> > > Reviewed-by: Felix Kuehling <felix.kuehling@xxxxxxx> > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 78 > > +++++++++++++++++++++++++++++++++ > > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 4 ++ > > 2 files changed, 82 insertions(+) > > Should be fixed with: > https://patchwork.freedesktop.org/patch/470069/ > > Alex > > > > > > > Am Donnerstag, dem 20.01.2022 um 00:22 +0100 schrieb Bert > > Karwatzki: > > > Reverting commit 72f686438de13f121c52f58d7445570a33dfdc61 does > > > not > > > change the errors: > > > [ 1.310550] ------------[ cut here ]------------ > > > [ 1.310551] trying to bind memory to uninitialized GART ! > > > [ 1.310556] WARNING: CPU: 9 PID: 252 at > > > drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c:254 > > > amdgpu_gart_bind+0x2e/0x40 > > > [amdgpu] > > > [ 1.310659] Modules linked in: amdgpu(+) gpu_sched > > > i2c_algo_bit > > > drm_ttm_helper hid_sensor_hub ttm hid_generic nvme drm_kms_helper > > > nvme_core cec xhci_pci t10_pi r8169 rc_core crc32_pclmul > > > crc_t10dif > > > i2c_hid_acpi realtek xhci_hcd psmouse crc32c_intel > > > crct10dif_generic > > > i2c_hid amd_sfh mdio_devres crct10dif_pclmul drm i2c_piix4 > > > usbcore > > > libphy crct10dif_common wmi button battery video fjes(-) hid > > > [ 1.310672] CPU: 9 PID: 252 Comm: systemd-udevd Not tainted > > > 5.13.0+ > > > #4 > > > [ 1.310673] Hardware name: Micro-Star International Co., Ltd. > > > Alpha > > > 15 B5EEK/MS-158L, BIOS E158LAMS.107 11/10/2021 > > > [ 1.310674] RIP: 0010:amdgpu_gart_bind+0x2e/0x40 [amdgpu] > > > [ 1.310762] Code: 00 80 bf 34 25 00 00 00 74 14 4c 8b 8f 20 25 > > > 00 > > > 00 > > > 4d 85 c9 74 05 e9 01 ff ff ff 31 c0 c3 48 c7 c7 68 36 dd c0 e8 86 > > > db > > > 19 > > > e8 <0f> 0b b8 ea ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f > > > 44 > > > 00 > > > [ 1.310763] RSP: 0018:ffffb19d00c33920 EFLAGS: 00010282 > > > [ 1.310764] RAX: 0000000000000000 RBX: 0000000000000067 RCX: > > > ffffffffa9abb208 > > > [ 1.310765] RDX: 0000000000000000 RSI: 00000000ffffefff RDI: > > > ffffffffa9a63200 > > > [ 1.310766] RBP: ffff985ce2a796c0 R08: 0000000000000000 R09: > > > ffffb19d00c33748 > > > [ 1.310766] R10: ffffb19d00c33740 R11: ffffffffa9ad3248 R12: > > > 0000000000000000 > > > [ 1.310766] R13: ffff985cd45a0000 R14: ffff985cd45a0000 R15: > > > 0000000000000000 > > > [ 1.310767] FS: 00007f69fabdc8c0(0000) > > > GS:ffff985f9e640000(0000) > > > knlGS:0000000000000000 > > > [ 1.310768] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 1.310768] CR2: 00007f69fabc5dca CR3: 00000001139ec000 CR4: > > > 0000000000750ee0 > > > [ 1.310769] PKRU: 55555554 > > > [ 1.310770] Call Trace: > > > [ 1.310772] amdgpu_ttm_gart_bind+0x79/0xc0 [amdgpu] > > > [ 1.310858] amdgpu_ttm_alloc_gart+0x146/0x1a0 [amdgpu] > > > [ 1.310942] amdgpu_bo_create_reserved.part.0+0xf8/0x1b0 > > > [amdgpu] > > > [ 1.311025] ? amdgpu_ttm_debugfs_init+0x110/0x110 [amdgpu] > > > [ 1.311145] amdgpu_bo_create_kernel+0x3b/0xa0 [amdgpu] > > > [ 1.311229] amdgpu_ttm_init.cold+0x165/0x17f [amdgpu] > > > [ 1.311349] gmc_v10_0_sw_init+0x2dc/0x430 [amdgpu] > > > [ 1.311455] amdgpu_device_init.cold+0x1544/0x1b54 [amdgpu] > > > [ 1.311570] ? acpi_ns_get_node+0x4f/0x5a > > > [ 1.311574] ? acpi_get_handle+0x8e/0xb7 > > > [ 1.311576] amdgpu_driver_load_kms+0x67/0x320 [amdgpu] > > > [ 1.311664] amdgpu_pci_probe+0x1bc/0x290 [amdgpu] > > > [ 1.311750] local_pci_probe+0x42/0x80 > > > [ 1.311753] ? __cond_resched+0x16/0x40 > > > [ 1.311755] pci_device_probe+0xfd/0x1b0 > > > [ 1.311756] really_probe+0xf2/0x460 > > > [ 1.311759] driver_probe_device+0xe8/0x160 > > > [ 1.311760] device_driver_attach+0xa1/0xb0 > > > [ 1.311761] __driver_attach+0x8f/0x150 > > > [ 1.311763] ? device_driver_attach+0xb0/0xb0 > > > [ 1.311764] ? device_driver_attach+0xb0/0xb0 > > > [ 1.311765] bus_for_each_dev+0x78/0xc0 > > > [ 1.311766] bus_add_driver+0x12b/0x1e0 > > > [ 1.311768] driver_register+0x8f/0xe0 > > > [ 1.311769] ? 0xffffffffc1828000 > > > [ 1.311770] do_one_initcall+0x44/0x1d0 > > > [ 1.311772] ? kmem_cache_alloc_trace+0x103/0x240 > > > [ 1.311775] do_init_module+0x5c/0x270 > > > [ 1.311777] __do_sys_finit_module+0xb1/0x110 > > > [ 1.311779] do_syscall_64+0x40/0xb0 > > > [ 1.311781] entry_SYSCALL_64_after_hwframe+0x44/0xae > > > [ 1.311783] RIP: 0033:0x7f69fb094679 > > > [ 1.311785] Code: 48 8d 3d 9a a1 0c 00 0f 05 eb a5 66 0f 1f 44 > > > 00 > > > 00 > > > 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 > > > 08 > > > 0f > > > 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c7 57 0c 00 f7 d8 64 89 > > > 01 > > > 48 > > > [ 1.311786] RSP: 002b:00007ffce4131708 EFLAGS: 00000246 > > > ORIG_RAX: > > > 0000000000000139 > > > [ 1.311788] RAX: ffffffffffffffda RBX: 000055d71350a3a0 RCX: > > > 00007f69fb094679 > > > [ 1.311788] RDX: 0000000000000000 RSI: 00007f69fb234eed RDI: > > > 0000000000000013 > > > [ 1.311789] RBP: 0000000000020000 R08: 0000000000000000 R09: > > > 000055d7134f3930 > > > [ 1.311789] R10: 0000000000000013 R11: 0000000000000246 R12: > > > 00007f69fb234eed > > > [ 1.311790] R13: 0000000000000000 R14: 000055d7134da0f0 R15: > > > 000055d71350a3a0 > > > [ 1.311791] ---[ end trace ff47998e3140e95d ]--- > > > [ 1.311793] [drm:amdgpu_ttm_gart_bind [amdgpu]] *ERROR* failed > > > to > > > bind 1 pages at 0x00400000 > > > [ 1.312100] amdgpu 0000:03:00.0: amdgpu: 00000000989bdfac bind > > > failed > > > > > > and using https://patchwork.freedesktop.org/patch/469907/ > > > gives a this message: > > > > > > [ 1.311502] ------------[ cut here ]------------ > > > [ 1.311502] WARNING: CPU: 9 PID: 221 at > > > drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c:244 > > > amdgpu_gart_bind+0x16/0x20 > > > [amdgpu] > > > [ 1.311602] Modules linked in: amdgpu(+) gpu_sched > > > i2c_algo_bit > > > drm_ttm_helper hid_sensor_hub ttm hid_generic nvme xhci_pci > > > drm_kms_helper nvme_core t10_pi xhci_hcd crc_t10dif r8169 cec > > > crct10dif_generic i2c_hid_acpi amd_sfh rc_core crct10dif_pclmul > > > realtek > > > i2c_hid crc32_pclmul mdio_devres psmouse usbcore crc32c_intel drm > > > libphy i2c_piix4 crct10dif_common button wmi battery video fjes(- > > > ) > > > hid > > > [ 1.311614] CPU: 9 PID: 221 Comm: systemd-udevd Not tainted > > > 5.13.0+ > > > #6 > > > [ 1.311616] Hardware name: Micro-Star International Co., Ltd. > > > Alpha > > > 15 B5EEK/MS-158L, BIOS E158LAMS.107 11/10/2021 > > > [ 1.311617] RIP: 0010:amdgpu_gart_bind+0x16/0x20 [amdgpu] > > > [ 1.311701] Code: 39 df 74 aa eb dc e8 19 a2 f6 f0 66 0f 1f 84 > > > 00 > > > 00 > > > 00 00 00 0f 1f 44 00 00 4c 8b 8f 20 25 00 00 4d 85 c9 74 05 e9 3a > > > ff > > > ff > > > ff <0f> 0b c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 fd 53 > > > 0f > > > ae > > > [ 1.311702] RSP: 0018:ffffb5be80b17948 EFLAGS: 00010246 > > > [ 1.311703] RAX: 0000000000000022 RBX: ffff98670583e858 RCX: > > > ffff98677c8a8738 > > > [ 1.311704] RDX: 0000000000000001 RSI: 0000000000400000 RDI: > > > ffff986720ce0000 > > > [ 1.311704] RBP: ffff986705841a08 R08: 0000000000000067 R09: > > > 0000000000000000 > > > [ 1.311705] R10: ffff986705841a08 R11: 0000000000000400 R12: > > > 0000000000000000 > > > [ 1.311705] R13: ffff98670dc50e40 R14: ffff986720ce0000 R15: > > > 0000000000000000 > > > [ 1.311706] FS: 00007ff4ee0968c0(0000) > > > GS:ffff9869de840000(0000) > > > knlGS:0000000000000000 > > > [ 1.311707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 1.311707] CR2: 00007ff4ee07fdca CR3: 000000017b436000 CR4: > > > 0000000000750ee0 > > > [ 1.311708] PKRU: 55555554 > > > [ 1.311708] Call Trace: > > > [ 1.311710] amdgpu_ttm_alloc_gart+0x147/0x190 [amdgpu] > > > [ 1.311793] amdgpu_bo_create_reserved.part.0+0xf8/0x1b0 > > > [amdgpu] > > > [ 1.311873] ? amdgpu_ttm_debugfs_init+0x110/0x110 [amdgpu] > > > [ 1.311952] amdgpu_bo_create_kernel+0x3b/0xa0 [amdgpu] > > > [ 1.312031] amdgpu_ttm_init.cold+0x165/0x17f [amdgpu] > > > [ 1.312181] gmc_v10_0_sw_init+0x2dc/0x430 [amdgpu] > > > [ 1.312275] amdgpu_device_init.cold+0x1544/0x1b54 [amdgpu] > > > [ 1.312385] ? acpi_ns_get_node+0x4f/0x5a > > > [ 1.312388] ? acpi_get_handle+0x8e/0xb7 > > > [ 1.312390] amdgpu_driver_load_kms+0x67/0x320 [amdgpu] > > > [ 1.312479] amdgpu_pci_probe+0x1bc/0x290 [amdgpu] > > > [ 1.312573] local_pci_probe+0x42/0x80 > > > [ 1.312578] ? __cond_resched+0x16/0x40 > > > [ 1.312581] pci_device_probe+0xfd/0x1b0 > > > [ 1.312583] really_probe+0xf2/0x460 > > > [ 1.312587] driver_probe_device+0xe8/0x160 > > > [ 1.312589] device_driver_attach+0xa1/0xb0 > > > [ 1.312591] __driver_attach+0x8f/0x150 > > > [ 1.312592] ? device_driver_attach+0xb0/0xb0 > > > [ 1.312593] ? device_driver_attach+0xb0/0xb0 > > > [ 1.312594] bus_for_each_dev+0x78/0xc0 > > > [ 1.312595] bus_add_driver+0x12b/0x1e0 > > > [ 1.312597] driver_register+0x8f/0xe0 > > > [ 1.312598] ? 0xffffffffc1696000 > > > [ 1.312599] do_one_initcall+0x44/0x1d0 > > > [ 1.312602] ? kmem_cache_alloc_trace+0x103/0x240 > > > [ 1.312604] do_init_module+0x5c/0x270 > > > [ 1.312606] __do_sys_finit_module+0xb1/0x110 > > > [ 1.312608] do_syscall_64+0x40/0xb0 > > > [ 1.312610] entry_SYSCALL_64_after_hwframe+0x44/0xae > > > [ 1.312614] RIP: 0033:0x7ff4ee54e679 > > > [ 1.312616] Code: 48 8d 3d 9a a1 0c 00 0f 05 eb a5 66 0f 1f 44 > > > 00 > > > 00 > > > 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 > > > 08 > > > 0f > > > 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c7 57 0c 00 f7 d8 64 89 > > > 01 > > > 48 > > > [ 1.312617] RSP: 002b:00007fff9c9d5bc8 EFLAGS: 00000246 > > > ORIG_RAX: > > > 0000000000000139 > > > [ 1.312618] RAX: ffffffffffffffda RBX: 000055aca9cf9230 RCX: > > > 00007ff4ee54e679 > > > [ 1.312619] RDX: 0000000000000000 RSI: 00007ff4ee6eeeed RDI: > > > 0000000000000013 > > > [ 1.312619] RBP: 0000000000020000 R08: 0000000000000000 R09: > > > 000055aca9cdf480 > > > [ 1.312620] R10: 0000000000000013 R11: 0000000000000246 R12: > > > 00007ff4ee6eeeed > > > [ 1.312620] R13: 0000000000000000 R14: 000055aca9cf9ab0 R15: > > > 000055aca9cf9230 > > > [ 1.312622] ---[ end trace c4fc99f16e2a1eb7 ]--- > > > > > > In both cases there's a similar error message for the oth GPU in > > > the > > > system (the integrated one at 08:00.0) > > > > > > Am Mittwoch, dem 19.01.2022 um 23:40 +0100 schrieb Das, Nirmoy: > > > > > > > > On 1/19/2022 10:59 PM, Limonciello, Mario wrote: > > > > > [Public] > > > > > > > > > > > -----Original Message----- > > > > > > From: Bert Karwatzki <spasswolf@xxxxxx> > > > > > > Sent: Wednesday, January 19, 2022 15:52 > > > > > > To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx > > > > > > Cc: Limonciello, Mario <Mario.Limonciello@xxxxxxx>; > > > > > > Kazlauskas, > > > > > > Nicholas > > > > > > <Nicholas.Kazlauskas@xxxxxxx>; Zhuo, Qingqing (Lillian) > > > > > > <Qingqing.Zhuo@xxxxxxx>; Scott Bruce <smbruce@xxxxxxxxx>; > > > > > > Alex > > > > > > Deucher > > > > > > <alexdeucher@xxxxxxxxx>; Chris Hixon > > > > > > <linux-kernel-bugs@xxxxxxxxxxxxx> > > > > > > Subject: amd-staging-drm-next breaks suspend > > > > > > > > > > > > I just tested drm-staging-drm-next with HEAD > > > > > > f1b2924ee6929cb431440e6f961f06eb65d52beb: > > > > > > Going into suspend leads to a hang again: > > > > > > This is probably caused by > > > > > > [ 1.310551] trying to bind memory to uninitialized GART ! > > > > > > and/or > > > > > > [ 3.976438] trying to bind memory to uninitialized GART ! > > > > > > > > > > > > Could you please also try > > > > https://patchwork.freedesktop.org/patch/469907/ ;? > > > > > > > > > > > > Regards, > > > > > > > > Nirmoy > > > > > > > > > > > >