Quoting Tvrtko Ursulin (2018-04-24 15:48:10) > > On 24/04/2018 15:04, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2018-04-24 14:55:51) > >> > >> On 24/04/2018 12:28, Chris Wilson wrote: > >>> Quoting Tvrtko Ursulin (2018-04-24 12:17:15) > >>>> > >>>> On 24/04/2018 11:40, Chris Wilson wrote: > >>>>> Quoting Tvrtko Ursulin (2018-04-24 11:14:21) > >>>>>> > >>>>>> On 23/04/2018 19:08, Chris Wilson wrote: > >>>>>>> -static int reserve_engine(struct intel_engine_cs *engine) > >>>>>>> +static int reserve_gt(struct drm_i915_private *i915) > >>>>>>> { > >>>>>>> - struct drm_i915_private *i915 = engine->i915; > >>>>>>> - u32 active = ++engine->timeline->inflight_seqnos; > >>>>>>> - u32 seqno = engine->timeline->seqno; > >>>>>>> int ret; > >>>>>>> > >>>>>>> - /* Reservation is fine until we need to wrap around */ > >>>>>>> - if (unlikely(add_overflows(seqno, active))) { > >>>>>>> + /* > >>>>>>> + * Reservation is fine until we may need to wrap around > >>>>>>> + * > >>>>>>> + * By incrementing the serial for every request, we know that no > >>>>>>> + * individual engine may exceed that serial (as each is reset to 0 > >>>>>>> + * on any wrap). This protects even the most pessimistic of migrations > >>>>>>> + * of every request from all engines onto just one. > >>>>>>> + */ > >>>>>> > >>>>>> I didn't really figure out what was wrong with v1? Neither could handle > >>>>>> more than four billion of simultaneously active requests - but I thought > >>>>>> that should not concern us. :) > >>>>> > >>>>> It was still using the local engine->timeline.seqno as it's base. If we > >>>>> swapped from one at 0 to another at U32_MAX, we would overflow much > >>>>> later in submission; after the point of no return. > >>>> > >>>> By swapped you already refer to engine change? Ok, I can see that yes. > >>>> In this case global counter does prevent that. > >>>> > >>>> In the light of that, what is your current thinking with regards to > >>>> mixing engine classes? > >>> > >>> That classes are a hw limitation that doesn't impact on balancing itself, > >>> just which engines the user is allowed to put into the same group. > >>> > >>>> If the thinking is still to only allow within a class then per-class > >>>> seqno counter would be an option. > >>> > >>> The goal of localising the seqno here was to try and reduce the locking > >>> requirements (or at least make it easier to reduce them in future). > >>> Whether it's one u32 across all engines, or one u32 across a few isn't > >>> enough for me to worry. The breadcrumb tracking should be happy enough > >>> (sorted by i915_seqno_passed rather than absolute u32) so the only > >>> limitation in wrapping should be gen7 HW semaphores. Hmm, with a bit of > >>> thought, I believe we can reduce the wrap logic to simply skip semaphore > >>> sync inside the danger zone. Would be worth the effort. > >> > >> I was thinking about reducing the number of global seqno resets as much > >> as we can in general. For instance would it be possible to keep using > >> the gt.active_requests together with a new gt.max_engine_seqno? The > >> latter would be the maximum last allocated seqno from the engine > >> timelines. This way reset would be much less frequent if the load is > >> distributed over engines (divided by num engines less frequent). > > > > I win with a divide by 0 with removing the global seqno and wrap. :-p > > > > The frequency we are talking about is a short wrap (will take as long as > > the active request takes to sync) approximately every 47 days divided by > > N engines. The cost of contention on struct_mutex must surely outweigh > > that during those 47/N days... > > You may be right, but where did 47 days come from? :) U32_MAX * assumed throughput of 1ms per request. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx