On 24/04/2018 15:04, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2018-04-24 14:55:51)
On 24/04/2018 12:28, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2018-04-24 12:17:15)
On 24/04/2018 11:40, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2018-04-24 11:14:21)
On 23/04/2018 19:08, Chris Wilson wrote:
-static int reserve_engine(struct intel_engine_cs *engine)
+static int reserve_gt(struct drm_i915_private *i915)
{
- struct drm_i915_private *i915 = engine->i915;
- u32 active = ++engine->timeline->inflight_seqnos;
- u32 seqno = engine->timeline->seqno;
int ret;
- /* Reservation is fine until we need to wrap around */
- if (unlikely(add_overflows(seqno, active))) {
+ /*
+ * Reservation is fine until we may need to wrap around
+ *
+ * By incrementing the serial for every request, we know that no
+ * individual engine may exceed that serial (as each is reset to 0
+ * on any wrap). This protects even the most pessimistic of migrations
+ * of every request from all engines onto just one.
+ */
I didn't really figure out what was wrong with v1? Neither could handle
more than four billion of simultaneously active requests - but I thought
that should not concern us. :)
It was still using the local engine->timeline.seqno as it's base. If we
swapped from one at 0 to another at U32_MAX, we would overflow much
later in submission; after the point of no return.
By swapped you already refer to engine change? Ok, I can see that yes.
In this case global counter does prevent that.
In the light of that, what is your current thinking with regards to
mixing engine classes?
That classes are a hw limitation that doesn't impact on balancing itself,
just which engines the user is allowed to put into the same group.
If the thinking is still to only allow within a class then per-class
seqno counter would be an option.
The goal of localising the seqno here was to try and reduce the locking
requirements (or at least make it easier to reduce them in future).
Whether it's one u32 across all engines, or one u32 across a few isn't
enough for me to worry. The breadcrumb tracking should be happy enough
(sorted by i915_seqno_passed rather than absolute u32) so the only
limitation in wrapping should be gen7 HW semaphores. Hmm, with a bit of
thought, I believe we can reduce the wrap logic to simply skip semaphore
sync inside the danger zone. Would be worth the effort.
I was thinking about reducing the number of global seqno resets as much
as we can in general. For instance would it be possible to keep using
the gt.active_requests together with a new gt.max_engine_seqno? The
latter would be the maximum last allocated seqno from the engine
timelines. This way reset would be much less frequent if the load is
distributed over engines (divided by num engines less frequent).
I win with a divide by 0 with removing the global seqno and wrap. :-p
The frequency we are talking about is a short wrap (will take as long as
the active request takes to sync) approximately every 47 days divided by
N engines. The cost of contention on struct_mutex must surely outweigh
that during those 47/N days...
You may be right, but where did 47 days come from? :)
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx