On 19/04/2023 23:10, Dixit, Ashutosh wrote:
On Wed, 19 Apr 2023 06:21:27 -0700, Tvrtko Ursulin wrote:
Hi Tvrtko,
On 10/04/2023 23:35, Ashutosh Dixit wrote:
Instead of erroring out when GuC reset is in progress, block waiting for
GuC reset to complete which is a more reasonable uapi behavior.
v2: Avoid race between wake_up_all and waiting for wakeup (Rodrigo)
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@xxxxxxxxx>
---
drivers/gpu/drm/i915/i915_hwmon.c | 38 +++++++++++++++++++++++++++----
1 file changed, 33 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_hwmon.c b/drivers/gpu/drm/i915/i915_hwmon.c
index 9ab8971679fe3..8471a667dfc71 100644
--- a/drivers/gpu/drm/i915/i915_hwmon.c
+++ b/drivers/gpu/drm/i915/i915_hwmon.c
@@ -51,6 +51,7 @@ struct hwm_drvdata {
char name[12];
int gt_n;
bool reset_in_progress;
+ wait_queue_head_t waitq;
};
struct i915_hwmon {
@@ -395,16 +396,41 @@ hwm_power_max_read(struct hwm_drvdata *ddat, long *val)
static int
hwm_power_max_write(struct hwm_drvdata *ddat, long val)
{
+#define GUC_RESET_TIMEOUT msecs_to_jiffies(2000)
+
+ int ret = 0, timeout = GUC_RESET_TIMEOUT;
Patch looks good to me
Great, thanks :)
apart that I am not sure what is the purpose of the timeout? This is just
the sysfs write path or has more callers?
It is just the sysfs path, but the sysfs is accessed also by the oneAPI
stack (Level 0). In the initial version I also didn't have the timeout
thinking that the app can send a signal to the blocked thread to unblock
it. I introduced the timeout after Rodrigo brought it up and I am now
thinking maybe it's better to have the timeout in the driver since the app
has no knowledge of how long GuC resets can take. But I can remove it if
you think it's not needed.
Maybe I am missing something but I don't get why we would need to
provide a timeout facility in sysfs? If the library writes here to
configure something it already has to expect a blocking write by the
nature of a a write(2) and sysfs contract. It can take long for any
reason so I hope we are not guaranteeing some latency number to someone?
Or the concern is just about things getting stuck? In which case I think
Ctrl-C is the answer because ETIME is not even listed as an errno for
write(2).
If the
former perhaps it would be better to just use interruptible everything
(mutex and sleep) and wait for as long as it takes or until user presses
Ctrl-C?
Now we are not holding the mutexes for long, just long enough do register
rmw's. So not holding the mutex across GuC reset as we were
originally. Therefore I am thinking mutex_lock_interruptible is not needed?
The sleep is already interruptible (TASK_INTERRUPTIBLE).
Ah yes, you are right.
Regards,
Tvrtko
Anyway please let me know if you think we need to change anything.
Thanks.
--
Ashutosh
struct i915_hwmon *hwmon = ddat->hwmon;
intel_wakeref_t wakeref;
- int ret = 0;
+ DEFINE_WAIT(wait);
u32 nval;
- mutex_lock(&hwmon->hwmon_lock);
- if (hwmon->ddat.reset_in_progress) {
- ret = -EAGAIN;
- goto unlock;
+ /* Block waiting for GuC reset to complete when needed */
+ for (;;) {
+ mutex_lock(&hwmon->hwmon_lock);
+
+ prepare_to_wait(&ddat->waitq, &wait, TASK_INTERRUPTIBLE);
+
+ if (!hwmon->ddat.reset_in_progress)
+ break;
+
+ if (signal_pending(current)) {
+ ret = -EINTR;
+ break;
+ }
+
+ if (!timeout) {
+ ret = -ETIME;
+ break;
+ }
+
+ mutex_unlock(&hwmon->hwmon_lock);
+
+ timeout = schedule_timeout(timeout);
}
+ finish_wait(&ddat->waitq, &wait);
+ if (ret)
+ goto unlock;
+
wakeref = intel_runtime_pm_get(ddat->uncore->rpm);
/* Disable PL1 limit and verify, because the limit cannot be
disabled on all platforms */
@@ -508,6 +534,7 @@ void i915_hwmon_power_max_restore(struct drm_i915_private *i915, bool old)
intel_uncore_rmw(hwmon->ddat.uncore, hwmon->rg.pkg_rapl_limit,
PKG_PWR_LIM_1_EN, old ? PKG_PWR_LIM_1_EN : 0);
hwmon->ddat.reset_in_progress = false;
+ wake_up_all(&hwmon->ddat.waitq);
mutex_unlock(&hwmon->hwmon_lock);
}
@@ -784,6 +811,7 @@ void i915_hwmon_register(struct drm_i915_private *i915)
ddat->uncore = &i915->uncore;
snprintf(ddat->name, sizeof(ddat->name), "i915");
ddat->gt_n = -1;
+ init_waitqueue_head(&ddat->waitq);
for_each_gt(gt, i915, i) {
ddat_gt = hwmon->ddat_gt + i;