On 03/11/2015 09:14 AM, Daniel Vetter wrote: > On Wed, Mar 11, 2015 at 02:53:39PM +0000, John Harrison wrote: >> On 05/03/2015 14:49, Daniel Vetter wrote: >>> On Thu, Mar 05, 2015 at 01:57:31PM +0000, John.C.Harrison@xxxxxxxxx wrote: >>>> From: John Harrison <John.C.Harrison@xxxxxxxxx> >>>> >>>> The LRC submission code requires a request for tracking purposes. It does not >>>> actually require that request to 'complete' it simply uses it for keeping hold >>>> of reference counts on contexts and such like. >>>> >>>> In the case where the ring buffer is completely full, the LRC code looks for a >>>> pending request that would free up sufficient space upon completion and waits >>>> for it. If no such request can be found it resorts to simply polling the free >>>> space count until it is big enough. This situation should only occur when the >>>> entire buffer is filled with the request currently being generated. I.e., the >>>> user is trying to submit a single piece of work that is large than the ring >>>> buffer itself (which should be impossible because very large batch buffers don't >>>> consume any more ring buffer space). Before starting to poll, a submit call is >>>> made to make sure that the currently queued up work in the buffer will actually >>>> be submtted and thus the poll will eventually succeed. >>>> >>>> The problem here is that the 'official' request cannot be used as that could >>>> lead to multiple LRC submissions being tagged to a single request structure. >>>> Instead, the code fakes up a private request structure and uses that. >>>> >>>> This patch moves the faked request allocation higher up in the call stack to the >>>> wait code itself (rather than being at the very lowest submission level). Thus >>>> it is now obvious where the faked request is coming from and why it is >>>> necessary. The patch also replaces it with a call to the official request >>>> allocation code rather than attempting to duplicate that code. This becomes >>>> especially important in the future when the request allocation changes to >>>> accommodate a conversion to struct fence. >>>> >>>> For: VIZ-5115 >>>> Signed-off-by: John Harrison <John.C.Harrison@xxxxxxxxx> >>> This is only possible if you pile up tons of olr. Since your patch series >>> fixes this design issue by removing olr I think we can just put a WARN_ON >>> in here if this ever happens and bail out with -ELOSTMYMARBLES or >>> something. And then rip out all this complexity. >>> >>> Or do I miss something important? >>> -Daniel >> >> Yeah, you missed the extremely important bug in the free space calculation >> that meant this impossible code path was being hit on a regular basis. The >> LRC wait_request code differed from the legacy wait_request code in the the >> latter was updated with request->postfix changes and the former was not. >> Thus the LRC one would happily find a request that frees up enough space, >> wait on it, retire it and then find there was still not enough space! >> >> New patches to fix the space calculation bug and to completely remove the >> polling path will be forth coming... > > Hm, Jesse did dig out a regression where gem_ringfill blows up on > execlist. That igt is specifically to exercise this cornercases. I'm not > sure whith bugzilla Jesse meant, there's two: > > https://bugs.freedesktop.org/show_bug.cgi?id=89001 > https://bugs.freedesktop.org/show_bug.cgi?id=88865 > > Can you please check whether your fixes help for those issues two? > > Also adding Jesse since he's chasing these atm. Ah cool, sounds related at least. 89001 was the one I looked at yesterday. The worrying thing was that this bug caused a hard system hang. Not even the reset button worked... I guess we should have an HSD for that too so we can root cause the system part of it. Jesse _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx