On Mon, Aug 26, 2013 at 01:42:12PM -0700, Ben Widawsky wrote: > On Sat, Aug 10, 2013 at 10:16:32PM +0100, Chris Wilson wrote: > > Ignoring the legacy DRI1 code, and a couple of special cases (to be > > discussed later), all access to the ring is mediated through requests. > > The first write to a ring will grab a seqno and mark the ring as having > > an outstanding_lazy_request. Either through explicitly adding a request > > after an execbuffer or through an implicit wait (either by the CPU or by > > a semaphore), that sequence of writes will be terminated with a request. > > So we can ellide all the intervening writes to the tail register and > > send the entire command stream to the GPU at once. This will reduce the > > number of *serialising* writes to the tail register by a factor or 3-5 > > times (depending upon architecture and number of workarounds, context > > switches, etc involved). This becomes even more noticeable when the > > register write is overloaded with a number of debugging tools. The > > astute reader will wonder if it is then possible to overflow the ring > > with a single command. It is not. When we start a command sequence to > > the ring, we check for available space and issue a wait in case we have > > not. The ring wait will in this case be forced to flush the outstanding > > register write and then poll the ACTHD for sufficient space to continue. > > > > The exception to the rule where everything is inside a request are a few > > initialisation cases where we may want to write GPU commands via the CS > > before userspace wakes up and page flips. > > > > I'm not a huge fan of having the second intel_ring_advance() that does something > other than it sounds like. I'd *much* prefer to not intel_ring_advance() > if you are certain more emits will be coming like in the case you > mention above. We can add a paranoia check whenever we're about to > return to userspace that tail == RING_TAIL > > Also, without performance data, it's hard to say this indirection is > worth it. Just a sample, UXA on i5-2520qm, aa10text: 2580000.0/sec -> 2980000.0/sec -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx