Hi, I have a RX 570 which fails to suspend properly under memory pressure and stays black after waking up. It looks like an allocation failure in the TTM VRAM eviction is to blame: [635471.240411] kworker/u24:26: page allocation failure: order:0, mode:0x620402(GFP_NOIO|__GFP_HIGHMEM|__GFP_RETRY_MAYFAIL|__GFP_HARDWALL), nodemask=(null),cpuset=/,mems_allowed=0 [635471.240416] CPU: 9 PID: 20884 Comm: kworker/u24:26 Tainted: P OE 5.0.0-13-generic #14-Ubuntu [635471.240417] Hardware name: MSI MS-7885/X99A SLI PLUS(MS-7885), BIOS 1.80 03/20/2015 [635471.240421] Workqueue: events_unbound async_run_entry_fn [635471.240421] Call Trace: [635471.240426] dump_stack+0x63/0x8a [635471.240428] warn_alloc.cold.119+0x7b/0xfb [635471.240429] __alloc_pages_slowpath+0xe63/0xea0 [635471.240432] ? flush_tlb_all+0x1c/0x20 [635471.240433] ? change_page_attr_set_clr+0x164/0x1f0 [635471.240434] __alloc_pages_nodemask+0x2c4/0x2e0 [635471.240437] alloc_pages_current+0x81/0xe0 [635471.240442] ttm_alloc_new_pages.isra.16+0x95/0x1e0 [ttm] [635471.240444] ttm_page_pool_get_pages+0x16b/0x380 [ttm] [635471.240446] ttm_pool_populate+0x1a3/0x4a0 [ttm] [635471.240448] ttm_populate_and_map_pages+0x28/0x250 [ttm] [635471.240450] ? ttm_dma_tt_alloc_page_directory+0x2d/0x60 [ttm] [635471.240490] amdgpu_ttm_tt_populate+0x56/0xe0 [amdgpu] [635471.240493] ttm_tt_populate.part.9+0x22/0x60 [ttm] [635471.240495] ttm_tt_bind+0x4f/0x60 [ttm] [635471.240497] ttm_bo_handle_move_mem+0x26c/0x500 [ttm] [635471.240499] ttm_bo_evict+0x142/0x1c0 [ttm] [635471.240501] ttm_mem_evict_first+0x19a/0x220 [ttm] [635471.240504] ttm_bo_force_list_clean+0xa1/0x170 [ttm] [635471.240506] ttm_bo_evict_mm+0x2e/0x30 [ttm] [635471.240531] amdgpu_bo_evict_vram+0x1a/0x20 [amdgpu] [635471.240554] amdgpu_device_suspend+0x1dd/0x3d0 [amdgpu] [635471.240578] amdgpu_pmops_suspend+0x1f/0x30 [amdgpu] [635471.240579] pci_pm_suspend+0x76/0x130 [635471.240580] ? pci_pm_freeze+0xf0/0xf0 [635471.240582] dpm_run_callback+0x66/0x150 [635471.240582] __device_suspend+0x110/0x490 [635471.240583] async_suspend+0x1f/0x90 [635471.240584] async_run_entry_fn+0x3c/0x150 [635471.240586] process_one_work+0x20f/0x410 [635471.240587] worker_thread+0x34/0x400 [635471.240589] kthread+0x120/0x140 [635471.240589] ? process_one_work+0x410/0x410 [635471.240591] ? __kthread_parkme+0x70/0x70 [635471.240592] ret_from_fork+0x35/0x40 … [635471.241994] [TTM] Buffer eviction failed [635471.627554] [TTM] Buffer eviction failed Subsequently it fails to wake up (all 3 screens black) because of an initialization failure: [635472.216323] amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110) [635472.216354] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v8_0> failed -110 [635472.216384] [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-110). [635472.216387] dpm_run_callback(): pci_pm_resume+0x0/0xb0 returns -110 [635472.216390] PM: Device 0000:04:00.0 failed to resume async: error -110 I’m pretty sure the problem is setting GFP_NOIO which makes it impossible for the kernel to swap anything out and it subsequently gives up trying to satisfy the allocation. I usually run under quite some memory pressure with a lot of swap (32GiB RAM + 48GiB Swap, >48GiB memory usage is regular). I have looked at the code in question but I’m not sure where this is coming from, it seems like neither ttm nor amdgpu set GFP_NOIO. TTM seems to have per-pool allocation flags and somehow GFP_NOIO is getting enabled there for the amdgpu pool. Thanks, Lorenz _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx