Re: [PATCH] drm/amd: Add Suspend/Hibernate notification callback support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 22, 2024 at 10:57 AM Mario Limonciello
<mario.limonciello@xxxxxxx> wrote:
>
> On 11/22/2024 08:28, Lazar, Lijo wrote:
> >
> >
> > On 11/19/2024 1:33 AM, Mario Limonciello wrote:
> >> As part of the suspend sequence VRAM needs to be evicted on dGPUs.
> >> In order to make suspend/resume more reliable we moved this into
> >> the pmops prepare() callback so that the suspend sequence would fail
> >> but the system could remain operational under high memory usage suspend.
> >>
> >> Another class of issues exist though where due to memory fragementation
> >> there isn't a large enough contiguous space and swap isn't accessible.
> >>
> >> Add support for a suspend/hibernate notification callback that could
> >> evict VRAM before tasks are frozen. This should allow paging out to swap
> >> if necessary.
> >>
> >> Link: https://github.com/ROCm/ROCK-Kernel-Driver/issues/174
> >> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3476
> >> Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  1 +
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 36 ++++++++++++++++++++++
> >>   2 files changed, 37 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >> index a37e687acbbc5..e70ca85151046 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >> @@ -885,6 +885,7 @@ struct amdgpu_device {
> >>      bool                            need_swiotlb;
> >>      bool                            accel_working;
> >>      struct notifier_block           acpi_nb;
> >> +    struct notifier_block           pm_nb;
> >>      struct amdgpu_i2c_chan          *i2c_bus[AMDGPU_MAX_I2C_BUS];
> >>      struct debugfs_blob_wrapper     debugfs_vbios_blob;
> >>      struct debugfs_blob_wrapper     debugfs_discovery_blob;
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> index b3ca911e55d61..5a4e9c7daf895 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> @@ -190,6 +190,8 @@ void amdgpu_set_init_level(struct amdgpu_device *adev,
> >>   }
> >>
> >>   static inline void amdgpu_device_stop_pending_resets(struct amdgpu_device *adev);
> >> +static int amdgpu_device_pm_notifier(struct notifier_block *nb, unsigned long mode,
> >> +                                 void *data);
> >>
> >>   /**
> >>    * DOC: pcie_replay_count
> >> @@ -4582,6 +4584,11 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> >>
> >>      amdgpu_device_check_iommu_direct_map(adev);
> >>
> >> +    adev->pm_nb.notifier_call = amdgpu_device_pm_notifier;
> >> +    r = register_pm_notifier(&adev->pm_nb);
> >> +    if (r)
> >> +            goto failed;
> >> +
> >>      return 0;
> >>
> >>   release_ras_con:
> >> @@ -4646,6 +4653,8 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
> >>              drain_workqueue(adev->mman.bdev.wq);
> >>      adev->shutdown = true;
> >>
> >> +    unregister_pm_notifier(&adev->pm_nb);
> >> +
> >>      /* make sure IB test finished before entering exclusive mode
> >>       * to avoid preemption on IB test
> >>       */
> >> @@ -4777,6 +4786,33 @@ static int amdgpu_device_evict_resources(struct amdgpu_device *adev)
> >>   /*
> >>    * Suspend & resume.
> >>    */
> >> +/**
> >> + * amdgpu_device_pm_notifier - Notification block for Suspend/Hibernate events
> >> + * @nb: notifier block
> >> + * @mode: suspend mode
> >> + * @data: data
> >> + *
> >> + * This function is called when the system is about to suspend or hibernate.
> >> + * It is used to evict resources from the device before the system goes to
> >> + * sleep while there is still access to swap.
> >> + *
> >> + */
> >> +static int amdgpu_device_pm_notifier(struct notifier_block *nb, unsigned long mode,
> >> +                                 void *data)
> >> +{
> >> +    struct amdgpu_device *adev = container_of(nb, struct amdgpu_device, pm_nb);
> >> +
> >> +    switch (mode) {
> >> +    case PM_HIBERNATION_PREPARE:
> >> +    case PM_SUSPEND_PREPARE:
> >> +            if (amdgpu_device_evict_resources(adev))
> >
> > This will result in an eviction call on APUs since the flags won't be
> > set by this time. Is that intended?
>
> Very good catch!  I will bump it and modify
> amdgpu_device_evict_resources() to just skip APUs entirely.

We can skip for suspend on APUs, but we need to keep it for hibernation.

Alex

>
> >
> > https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c#L4739
> >
> > Thanks,
> > Lijo
> >
> >> +                    return NOTIFY_BAD;
> >> +            break;
> >> +    }
> >> +
> >> +    return NOTIFY_DONE;
> >> +}
> >> +
> >>   /**
> >>    * amdgpu_device_prepare - prepare for device suspend
> >>    *
> >
>




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux