On Fri, Nov 22, 2024 at 5:03 AM Christian König <ckoenig.leichtzumerken@xxxxxxxxx> wrote: > > Am 18.11.24 um 21:03 schrieb Mario Limonciello: > > As part of the suspend sequence VRAM needs to be evicted on dGPUs. > > In order to make suspend/resume more reliable we moved this into > > the pmops prepare() callback so that the suspend sequence would fail > > but the system could remain operational under high memory usage suspend. > > > > Another class of issues exist though where due to memory fragementation > > there isn't a large enough contiguous space and swap isn't accessible. > > > > Add support for a suspend/hibernate notification callback that could > > evict VRAM before tasks are frozen. This should allow paging out to swap > > if necessary. > > > > Link: https://github.com/ROCm/ROCK-Kernel-Driver/issues/174 > > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3476 > > Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx> > > I'm absolutely not an expert on that stuff but it looks rather > reasonable to me. > > Alex what do you think? Yeah, same. Seems reasonable to me. Acked-by: Alex Deucher <alexander.deucher@xxxxxxx> > > Regards, > Christian. > > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 36 ++++++++++++++++++++++ > > 2 files changed, 37 insertions(+) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > > index a37e687acbbc5..e70ca85151046 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > > @@ -885,6 +885,7 @@ struct amdgpu_device { > > bool need_swiotlb; > > bool accel_working; > > struct notifier_block acpi_nb; > > + struct notifier_block pm_nb; > > struct amdgpu_i2c_chan *i2c_bus[AMDGPU_MAX_I2C_BUS]; > > struct debugfs_blob_wrapper debugfs_vbios_blob; > > struct debugfs_blob_wrapper debugfs_discovery_blob; > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > index b3ca911e55d61..5a4e9c7daf895 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > @@ -190,6 +190,8 @@ void amdgpu_set_init_level(struct amdgpu_device *adev, > > } > > > > static inline void amdgpu_device_stop_pending_resets(struct amdgpu_device *adev); > > +static int amdgpu_device_pm_notifier(struct notifier_block *nb, unsigned long mode, > > + void *data); > > > > /** > > * DOC: pcie_replay_count > > @@ -4582,6 +4584,11 @@ int amdgpu_device_init(struct amdgpu_device *adev, > > > > amdgpu_device_check_iommu_direct_map(adev); > > > > + adev->pm_nb.notifier_call = amdgpu_device_pm_notifier; > > + r = register_pm_notifier(&adev->pm_nb); > > + if (r) > > + goto failed; > > + > > return 0; > > > > release_ras_con: > > @@ -4646,6 +4653,8 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) > > drain_workqueue(adev->mman.bdev.wq); > > adev->shutdown = true; > > > > + unregister_pm_notifier(&adev->pm_nb); > > + > > /* make sure IB test finished before entering exclusive mode > > * to avoid preemption on IB test > > */ > > @@ -4777,6 +4786,33 @@ static int amdgpu_device_evict_resources(struct amdgpu_device *adev) > > /* > > * Suspend & resume. > > */ > > +/** > > + * amdgpu_device_pm_notifier - Notification block for Suspend/Hibernate events > > + * @nb: notifier block > > + * @mode: suspend mode > > + * @data: data > > + * > > + * This function is called when the system is about to suspend or hibernate. > > + * It is used to evict resources from the device before the system goes to > > + * sleep while there is still access to swap. > > + * > > + */ > > +static int amdgpu_device_pm_notifier(struct notifier_block *nb, unsigned long mode, > > + void *data) > > +{ > > + struct amdgpu_device *adev = container_of(nb, struct amdgpu_device, pm_nb); > > + > > + switch (mode) { > > + case PM_HIBERNATION_PREPARE: > > + case PM_SUSPEND_PREPARE: > > + if (amdgpu_device_evict_resources(adev)) > > + return NOTIFY_BAD; > > + break; > > + } > > + > > + return NOTIFY_DONE; > > +} > > + > > /** > > * amdgpu_device_prepare - prepare for device suspend > > * >