From: John Harrison <John.C.Harrison@xxxxxxxxx> While working on a customer project, it was noticed that the time taken to issue a pre-emption request to the GuC would vary quite significantly. The correct case was low microseconds but the worst case was tens of milliseconds. Two separate issues were identified as causing this extra delay. The acquisition of the GuC send mutex lock and the stall waiting for an ack from the GuC. Both of these are necessary steps because the access to the send action mechanism must be serialised. It is not designed for multiple concurrent operations. Mutex Lock Delay: The first issue seems to be with the way mutex_lock() itself is implemented in the linux kernel. Even when the lock is available, it seems that linux sometimes decides to do a task switch anyway and not re-schedule the thread that requested the lock for some time. Hence the time taken to acquire the mutex lock would generally be essentially zero but in around 0.05% of cases it would be in the 6-10ms range. Note that there was never any actual contention of the mutex observed. In all cases, the mutex was available. The workaround implemented for the customer was to first use a mutex_trylock(). If the try fails then it falls back to the regular mutex_lock() call instead. As the try lock is not permitted to stall, it never hits the random linux task switch issue. Running overnight, 3.2 million pre-emption events were logged with not a single mutex delay over 10 microseconds (which was the granularity of the measurement being used). Ack Delay: The second issue was with the wait for the GuC acknowledgment of the send action command. In most cases, no delay is required - the completion flag is set by the time of the first read. However, around 1% of the time the flag was not set and the wait_for() loop then took in the region of 8-10ms to return. Analysis showed that this delay was also entirely due to the linux kernel task switching away due to using a sleeping wait in the polling loop. Using a non-sleeping wait brought the latency down to the 10-20us range in the majority of 'slow' cases. Certain tests had a noticeable spike of about 120us, presumably when the delay was actually due to the GuC being busy. The worse extremes measured were around 200us, with maybe one or two hits of 300us in an overnight run. The polling construct being used was a 'wait_for_us()' of 10 microseconds (spinning poll) followed by a 'wait_for()' of 10ms (sleeping poll) with the final fall back being to give up and return an error. As the command quite frequently takes just over 10us to complete, this meant the code would regularly drop into the second loop. Despite the sleep time being set for 1ms, it would actually sleep for an average of 1.5ms and as noted would commonly hit closer to 10ms. The workaround was to extend the busy poll period to 100us in the case of a pre-emption request. In the case of any other GuC send action command, the original 10us was kept. The reasoning being that a pre-emption request is a very high priority, urgent thing and is worth burning the extra CPU cycles on a busy poll in order to guarantee a suitably small latency. Whereas, the other send action commands are not performance critical and a longer but more system friendly sleep is not an issue. Note that the two patches included here are based against the customer's 4.11 tree. So they might not apply too well the current upstream pre-emption implementation. It was noted that a recent update to the gem_exec_latency IGT showed "additional latency in the mutex for guc preemption". Which sounds like the same issue that was seen here. Hence posting what I have to a wider audience. John Harrison (2): drm/i915: Extend GuC action fast spin time drm/i915: Avoid stalling on GuC send mutex lock drivers/gpu/drm/i915/i915_guc_submission.c | 12 ++++++------ drivers/gpu/drm/i915/intel_guc_log.c | 6 +++--- drivers/gpu/drm/i915/intel_huc.c | 2 +- drivers/gpu/drm/i915/intel_uc.c | 23 ++++++++++++++++++----- drivers/gpu/drm/i915/intel_uc.h | 2 +- 5 files changed, 29 insertions(+), 16 deletions(-) -- 2.13.0 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx