Am 13.04.2018 um 09:01 schrieb S, Shirish: > > > On 4/13/2018 11:53 AM, Christian König wrote: >> Am 13.04.2018 um 06:07 schrieb Shirish S: >>> amdgpu_ib_ring_tests() runs test IB's on rings at boot >>> contributes to ~500 ms of amdgpu driver's boot time. >>> >>> This patch defers it and adds a check to report >>> in amdgpu_info_ioctl() if it was scheduled or not. >> >> That is rather suboptimal, but see below. >> > Which part is sub-optimal, deferring or checking if the work is > scheduled? That was about the check. We should wait for the test to finish instead of printing an error and continuing. >>> >>> Signed-off-by: Shirish S <shirish.s at amd.com> >>> --- >>>  drivers/gpu/drm/amd/amdgpu/amdgpu.h       | 2 ++ >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 +++++++++++++++++--- >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   | 3 +++ >>>  3 files changed, 22 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>> index 5734871..ae8f722 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>> @@ -1611,6 +1611,8 @@ struct amdgpu_device { >>>       /* delayed work_func for deferring clockgating during resume */ >>>      struct delayed_work    late_init_work; >>> +   /* delayed work_func to defer testing IB's on rings during boot */ >>> +   struct delayed_work    late_init_test_ib_work; >> >> You must put the IB test into the late_init_work as well, otherwise >> the two delayed workers can race with each other. >> > I thought from the comment above the declaration, its clear why i am > creating 2 work structures. > late_init_work is to optimize resume time and late_init_test_ib_work > is to optimize the boot time. > There cant be race as the context in which they are called are totally > different. Late init enables power and clock gating. If I'm not completely mistaken we don't do the power/clock gating earlier because we had to wait for the IB test to finish. Could be that modern ASICs have additional logic to prevent that, but the last time I worked on this power gating a block while you run something on it could even crash the whole system. >>>       struct amdgpu_virt   virt; >>>      /* firmware VRAM reservation */ >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> index 1762eb4..e65a5e6 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> @@ -63,6 +63,7 @@ MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); >>>  MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin"); >>>   #define AMDGPU_RESUME_MS       2000 >>> +#define AMDGPU_IB_TEST_SCHED_MS       2000 >>>   static const char *amdgpu_asic_name[] = { >>>      "TAHITI", >>> @@ -2105,6 +2106,16 @@ bool amdgpu_device_asic_has_dc_support(enum >>> amd_asic_type asic_type) >>>      } >>>  } >>>  +static void amdgpu_device_late_init_test_ib_func_handler(struct >>> work_struct *work) >>> +{ >>> +   struct amdgpu_device *adev = >>> +       container_of(work, struct amdgpu_device, >>> late_init_test_ib_work.work); >>> +   int r = amdgpu_ib_ring_tests(adev); >>> + >>> +   if (r) >>> +       DRM_ERROR("ib ring test failed (%d).\n", r); >>> +} >>> + >>>  /** >>>   * amdgpu_device_has_dc_support - check if dc is supported >>>   * >>> @@ -2212,6 +2223,8 @@ int amdgpu_device_init(struct amdgpu_device >>> *adev, >>>      INIT_LIST_HEAD(&adev->ring_lru_list); >>>      spin_lock_init(&adev->ring_lru_list_lock); >>>  +   INIT_DELAYED_WORK(&adev->late_init_test_ib_work, >>> +             amdgpu_device_late_init_test_ib_func_handler); >>>      INIT_DELAYED_WORK(&adev->late_init_work, >>>                amdgpu_device_ip_late_init_func_handler); >>>  @@ -2374,9 +2387,9 @@ int amdgpu_device_init(struct amdgpu_device >>> *adev, >>>          goto failed; >>>      } >>>  -   r = amdgpu_ib_ring_tests(adev); >>> -   if (r) >>> -       DRM_ERROR("ib ring test failed (%d).\n", r); >>> +   /* Schedule amdgpu_ib_ring_tests() */ >>> +   mod_delayed_work(system_wq, &adev->late_init_test_ib_work, >>> +           msecs_to_jiffies(AMDGPU_IB_TEST_SCHED_MS)); >> >> That doesn't work like you intended. mod_delayed_work() overrides the >> existing handler. >> >> What you wanted to use is queue_delayed_work(), but as I said we >> should only have one delayed worker. > mod_delayed_work() is safer and optimal method that replaces > cancel_delayed_work() followed by queue_delayed_work(). > (https://lkml.org/lkml/2011/2/3/175) > But if you strongly insist i don't mind changing it. Well, mod_delayed_work() does NOT replace queue_delayed_work(). Those two functions are for different use cases. The link you posted actually explains it quite well: > So, cancel_delayed_work() followed by queue_delayed_work() schedules > the work to be executed at the specified time regardless of the > current pending state while queue_delayed_work() takes effect iff > currently the work item is not pending. queue_delayed_work() takes only effect if the work item is not already pending/executing. In other words with queue_delayed_work() you don't risk executing things twice when the work is already executing. While with mod_delayed_work() you absolutely make sure to execute things twice when it is already running. If I'm not completely mistaken it should not matter in this case, but queue_delayed_work() is used more commonly I think. >> >>>       if (amdgpu_sriov_vf(adev)) >>>          amdgpu_virt_init_data_exchange(adev); >>> @@ -2469,6 +2482,7 @@ void amdgpu_device_fini(struct amdgpu_device >>> *adev) >>>      } >>>      adev->accel_working = false; >>>      cancel_delayed_work_sync(&adev->late_init_work); >>> + cancel_delayed_work_sync(&adev->late_init_test_ib_work); >>>      /* free i2c buses */ >>>      if (!amdgpu_device_has_dc_support(adev)) >>>          amdgpu_i2c_fini(adev); >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >>> index 487d39e..057bd9a 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >>> @@ -279,6 +279,9 @@ static int amdgpu_info_ioctl(struct drm_device >>> *dev, void *data, struct drm_file >>>      if (!info->return_size || !info->return_pointer) >>>          return -EINVAL; >>>  +   if (delayed_work_pending(&adev->late_init_test_ib_work)) >>> +       DRM_ERROR("IB test on ring not executed\n"); >>> + >> >> Please use flush_delayed_work() instead of issuing and error here. >> > Agree, wasn't sure of what to do here :). > So i will re-spin with the flush part added. Hope this reply clarifies > your comments. > Thanks. > Regards, > Shirish S >> Regards, >> Christian. >> >>>      switch (info->query) { >>>      case AMDGPU_INFO_ACCEL_WORKING: >>>          ui32 = adev->accel_working; >> >