On Fri, Nov 17, 2017 at 04:39:32PM -0800, John.C.Harrison@xxxxxxxxx wrote: > From: John Harrison <John.C.Harrison@xxxxxxxxx> > > While working on a customer project, it was noticed that the time > taken to issue a pre-emption request to the GuC would vary quite > significantly. The correct case was low microseconds but the worst > case was tens of milliseconds. Two separate issues were identified as > causing this extra delay. The acquisition of the GuC send mutex lock > and the stall waiting for an ack from the GuC. Both of these are > necessary steps because the access to the send action mechanism must > be serialised. It is not designed for multiple concurrent operations. > > Mutex Lock Delay: > The first issue seems to be with the way mutex_lock() itself is > implemented in the linux kernel. Even when the lock is available, it > seems that linux sometimes decides to do a task switch anyway and not > re-schedule the thread that requested the lock for some time. Hence > the time taken to acquire the mutex lock would generally be > essentially zero but in around 0.05% of cases it would be in the > 6-10ms range. Note that there was never any actual contention of the > mutex observed. In all cases, the mutex was available. > > The workaround implemented for the customer was to first use a > mutex_trylock(). If the try fails then it falls back to the regular > mutex_lock() call instead. As the try lock is not permitted to stall, > it never hits the random linux task switch issue. Running overnight, > 3.2 million pre-emption events were logged with not a single mutex > delay over 10 microseconds (which was the granularity of the > measurement being used). This sounds like a core mutex implementation bug. Have you talked with relevant maintainers (some work for Intel even)? -Daniel > Ack Delay: > The second issue was with the wait for the GuC acknowledgment of the > send action command. In most cases, no delay is required - the > completion flag is set by the time of the first read. However, around > 1% of the time the flag was not set and the wait_for() loop then took > in the region of 8-10ms to return. Analysis showed that this delay was > also entirely due to the linux kernel task switching away due to using > a sleeping wait in the polling loop. Using a non-sleeping wait brought > the latency down to the 10-20us range in the majority of 'slow' cases. > Certain tests had a noticeable spike of about 120us, presumably when > the delay was actually due to the GuC being busy. The worse extremes > measured were around 200us, with maybe one or two hits of 300us in an > overnight run. > > The polling construct being used was a 'wait_for_us()' of 10 > microseconds (spinning poll) followed by a 'wait_for()' of 10ms > (sleeping poll) with the final fall back being to give up and return > an error. As the command quite frequently takes just over 10us to > complete, this meant the code would regularly drop into the second > loop. Despite the sleep time being set for 1ms, it would actually > sleep for an average of 1.5ms and as noted would commonly hit closer > to 10ms. > > The workaround was to extend the busy poll period to 100us in the case > of a pre-emption request. In the case of any other GuC send action > command, the original 10us was kept. The reasoning being that a > pre-emption request is a very high priority, urgent thing and is worth > burning the extra CPU cycles on a busy poll in order to guarantee a > suitably small latency. Whereas, the other send action commands are > not performance critical and a longer but more system friendly sleep > is not an issue. > > > Note that the two patches included here are based against the > customer's 4.11 tree. So they might not apply too well the current > upstream pre-emption implementation. It was noted that a recent update > to the gem_exec_latency IGT showed "additional latency in the mutex > for guc preemption". Which sounds like the same issue that was seen > here. Hence posting what I have to a wider audience. > > John Harrison (2): > drm/i915: Extend GuC action fast spin time > drm/i915: Avoid stalling on GuC send mutex lock > > drivers/gpu/drm/i915/i915_guc_submission.c | 12 ++++++------ > drivers/gpu/drm/i915/intel_guc_log.c | 6 +++--- > drivers/gpu/drm/i915/intel_huc.c | 2 +- > drivers/gpu/drm/i915/intel_uc.c | 23 ++++++++++++++++++----- > drivers/gpu/drm/i915/intel_uc.h | 2 +- > 5 files changed, 29 insertions(+), 16 deletions(-) > > -- > 2.13.0 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx