On Tue, Feb 21, 2017 at 01:19:37PM +0000, Chris Wilson wrote: > On Tue, Feb 21, 2017 at 02:43:30PM +0200, Imre Deak wrote: > > On Tue, Feb 21, 2017 at 10:06:45AM +0000, Tvrtko Ursulin wrote: > > > > > > On 21/02/2017 09:37, Chris Wilson wrote: > > > >On Tue, Feb 21, 2017 at 11:22:12AM +0200, Imre Deak wrote: > > > >>On Mon, Feb 20, 2017 at 04:05:33PM +0000, Chris Wilson wrote: > > > >>>So that our preempt-off period doesn't grow completely unchecked, or do > > > >>>we need that 34ms loop? > > > >> > > > >>Yes, that's at least how I understand it. Scheduling away is what let's > > > >>PCODE start servicing some other request than ours or go idle. That's > > > >>in a way what we see when the preempt-enabled poll times out. > > > > > > > >I was thinking along the lines of if it was just busy/unavailable for the > > > >first 33ms that particular time, it just needed to sleep until ready. > > > >Once available, the next request ran in the expected 1ms. > > > > > > >Do you not see any value in trying a sleeping loop? Perhaps compromise > > > >and have the preempt-disable timeout increase each iteration. > > > > This fallback method would work too, but imo the worst case is what > > matters and that would be anyway the same in both cases. Because of this > > and since it's a WA I'd rather keep it simple. > > > > > Parachuting in so apologies if I misunderstood something. > > > > > > Is the issue here that we can get starved out of CPU time for more than 33ms > > > while waiting for an event? > > > > We need to actively resend the same request for this duration. > > > > > Could we play games with sched_setscheduler and maybe temporarily go > > > SCHED_DEADLINE or something? Would have to look into how to correctly > > > restore to the old state from that and from which contexts we can actually > > > end up in this wait. > > > > What would be the benefit wrt. disabling preemption? Note that since > > it's a workaround it would be good to keep it simple and close to how it > > worked on previous platforms (SKL/APL). > > Yeah, I'm not happy with busy-spinning for 34ms without any scheduler > interaction at all. Or that we don't handle the failure gracefully. Or > that the hw appears pretty flimsy and the communitcation method is hit > and miss. Yes, me neither. It's clearly not by design, since based on the specification two requests 3ms apart would need to be enough. > I'd accept a compromise that bumped the timer to 50ms i.e. didn't have > to up the BUILD_BUG_ON. Only a 50% safety factor, but we are already > an order of magnitude beyond the expected response time. > > 50 I would ack. :| Ok, I can resend with that if Tvrtko agrees. --Imre