On Fri, Jul 10, 2015 at 04:48:53PM +0100, Tomas Elf wrote: > On 10/07/2015 16:24, Tomas Elf wrote: > >On 09/07/2015 19:47, Chris Wilson wrote: > >>On Mon, Jun 08, 2015 at 06:03:18PM +0100, Tomas Elf wrote: > >>>This patch series introduces the following features: > >>> > >>>* Feature 1: TDR (Timeout Detection and Recovery) for gen8 execlist > >>>mode. > >>>* Feature 2: Watchdog Timeout (a.k.a "media engine reset") for gen8. > >>>* Feature 3. Context Submission Status Consistency checking > >> > >>The high level design is reasonable and conceptually extends the current > >>system in fairly obvious ways. > >> > >>In terms of discussing the implementation, I think this can be phased > >>as: > >> > >>0. Move to a per-engine hangcheck > >> > >>1. Add fake-interrupt recovery for CSSC > >> > >> I think this can be done without changing the hangcheck heuristics at > >> all - we just need to think carefully about recovery (which is a nice > >> precursor to per-engine reset). I may be wrong, and so would like to > >> be told so early on! If the fake interrupt recovery is done as part of > >> the reset handler, we should have (one) fewer concurrency issues to > >> worry about. > > > >Some points about moving the CSSC out of the hang checker and into the > >reset handler: > > > >1. If we deal with consistency rectification in the reset handler the > >turnaround time becomes REALLY long: > > > > a. First you have the time to detect the hang, call > >i915_handle_error() that then raises the reset in progress flag, > >preventing further submissions to the driver. > > > > b. Then go all the way to the per-engine recovery path, only to > >discover that we've got an inconsistency that has not been handled, fall > >back immediately with -EAGAIN and lower the reset in progress flag, let > >the system continue running and defer to the next hang check (another > >hang detection period) > > > > c. Once the hang has been detected AGAIN, raise the reset in > >progress flag AGAIN and go back to the engine reset path a second time. > > > > d. At the start of the engine reset path we do the second CSSC > >detection and realise that we've got a stable inconsistency that we can > >attempt to rectify. We can then try to rectify the inconsistency and go > >through with the engine reset... AFTER we've checked that the > >inconsistency rectification was indeed effective! If it's not and the > >problem remains then we have to fail the engine recovery mode and fall > >back to full GPU reset immediately... Which we could have done from the > >hang checker if we had just refused to schedule hang recovery and just > >let the context submission state inconsistency persist and let the hang > >score keep rising until the hang score reached 2*HUNG, which would then > >have triggered the full GPU reset fallback from the hang checker (see > >path 9/11 for all of this) > > > >As you can see, dealing with context submission state inconsistency in > >the reset path is very long-winded way of doing it and does not make it > >more reliable. Also, it's more complicated to analyse from a concurrency > >point of view since we need to fall back several times and raise and > >lower the reset in progress flag, which allows driver submissions to > >happen vs. blocks submissions. It basically becomes very difficult to > >know what is going on. > > > >2. Secondly, and more importantly, if a watchdog timeout is detected and > >we end up in the per-engine hang recovery path and have to fall back due > >to an inconsistent context submission state at that point and the hang > >checker is turned off then we're irrecoverably hung. Watchdog timeout is > >supposed to work without the periodic hang checker but it won't if CSSC > >is not ensured at all times. Which is why I chose to override the > >i915.enable_hangcheck flag to make sure that the hang checker always > >runs consistency pre-checking and reschedules itself if there is more > >work pending to make sure that as long as work is pending we do > >consistency checking asynchronously regardless of everything else so > >that if a watchdog timeout hits we have a consistent state once the > >watchdog timeout ends up in per-engine recovery. > > > >Granted, if a watchdog timeout hits after we've first detected the > >inconsistency but not yet had time to rectify it it doesn't work if the > >hang checker is turned off and we cannot rely on periodic hang checking > >to schedule hang recovery in this case - so in that case we're still > >irrecoverably stuck. We could make change here and do a one-time > >i915.enable_hangcheck override and schedule hang recovery following this > >point. If you think it's worth it. > > > >Bottom line: The consistency checking must happen at all times and > >cannot be done as a consequence of a scheduled reset if hang checking is > >turned off at any point. > > > >As far as concurrency issues in the face of CSSC is concerned, > >disregarding the complication of handling CSSC in the recovery path and > >relying on deferring to the next hang detection with all of the > >concurrency issues that entails: The question really is what kind of > >concurrency issues we're worried about. If the hang checker determines > >that we've got a hang then that's a stable state. If the hang checker > >consistency pre-check determines that we've got a sustained CSSC > >inconsistency then that's stable too. The states are not changing so > >whatever we do will not be because we detect the state in the middle of > >a state transition and the detection won't be subject to concurrency > >effects. If the hang checker decides that the inconsistency needs to be > >rectified and fakes the presumably lost interrupt and the real, presumed > >lost, interrupt happens to come in at the same time then that's fine, > >the CSB buffer check in the execlist interrupt handler is made to cope > >with that. We can have X number of calls to the interrupt handler or > >just one, the outcome is supposed to be the same - the only thing that > >matters is captured context state changes in the CSB buffer that we act > >upon. > > > >So I'm not entirely sure what concurrency issues might be reason enough > >to move out the CSSC to the hang recovery path. In fact, I'd be more > >inclined to create a second async task for it to make sure it's being > >run at all times. But in that case we might as well let it stay in the > >hang checker. > > > >(In a similar vein, I think we should move the missed > >> interupt handler for legacy out of hangcheck, partly to simplify some > >> very confusing code and partly so that we have fewer deviations > >> between legacy/execlists paths.) It also gets us thinking about the > >> consistency detection and when it is viable to do a fake-interrupt and > >> when we must do a full-reset (for example, we only want to > >> fake-interrupt if the consistency check says the GPU is idle, and we > >> definitely want to reset everything if the GPU is executing an alien > >> context.) > >> > >> A test here would be to suspend the execlists irq and wait for the > >> recovery. Cheekily we could punch the irq eir by root mmio and check > >> hangcheck automagically recovers. > >> > >>Whilst it would be nice to add the watchdog next, since it is > >>conceptually quite simple and basically just a new hangcheck source with > >>fancy setup - fast hangcheck without soft reset makes for an easy DoS. > >> > >>2. TDR > >> > >> Given that we have a consistency check and begun to extend the reset > >> path, we can implement a soft reset that only skips the hung request. > >> (The devil is in the details, whilst the design here looked solid, I > >> think the LRC recovery code could be simplified - I didn't feel > >> another requeue path was required given that we need only pretend a > >> request completed (fixing up the context image as required) and then > >> use the normal unqueue.) There is also quite a bit of LRC cleanup on > >> the lists which would be useful here. > > > >As far as the new requeue (or resubmission) path is concerned, you might > >have a point here. The reason it's as involved as it is is probably > >mostly because of all the validation that takes place in the > >resubmission path. Meaning that once the resubmission happens at the end > >of the per-engine hang recovery path we want to make extra sure that the > >context that gets resubmitted in the end (the head element of the queue > >at that point in time) is in fact the one that was passed down from the > >per-engine hang recovery path (the context at the head of the queue at > >the start of the hang recovery path), so that the state of the queue > >didn't change during hang recovery. Maybe we're too paranoid here. > > > > Ah, yes, there is one crucial difference between the normal > execlists_context_unqueue() function and > execlists_TDR_context_unqueue() that means that we cannot fully > reuse the former one for TDR-specific purposes. > > When we resubmit the context via TDR it's important that we do not > increment the elsp_submitted counter and otherwise treat the > resubmission as a normal submission. The reason for this is that the > hardware consistently refuses to send out any kind of interrupt > acknowledging the context resubmission. If you just submit the > context and increment elsp_submitted for that context, like you > normally do, the interrupt handler will sit around forever waiting > for the interrupt that will never come for the resubmission. It will > wait - and receive - the interrupt for the original submission that > caused the original hang and also the interrupt for the context > completion following hang recovery. But it won't receive an > interrupt for the resubmission. But that's just an issue with the current poor design of execlists... If we tracked exactly what request we submitted to each port, and a unique ctx id tag for every submission (incl subsumed ones for easier completion handling) you completely eliminate the need for elsp_submitted. And a TDR resubmit is just that. (And also the code is much smaller and much simpler.) So I don't think you have sufficiently reworked execlists to avoid unnecessary duplication. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx