"Gupta, Sourab" <sourab.gupta@xxxxxxxxx> writes: > On Tue, 2014-05-06 at 17:56 +0000, Eric Anholt wrote: >> sourab.gupta@xxxxxxxxx writes: >> >> > From: Sourab Gupta <sourab.gupta@xxxxxxxxx> >> > >> > This patch is in continuation of and is dependent on earlier patch >> > series to 'reduce the time for which device mutex is kept locked'. >> > (http://lists.freedesktop.org/archives/intel-gfx/2014-May/044596.html) >> >> One of userspace's assumptions is that when you allocate a new BO, you >> can map it and start writing data into it without needing to wait on the >> GPU. I expect this patch to mostly hurt performance on apps (and I note >> that the patch doesn't come with any actual performance data) that get >> more stalls as a result. >> > Hi Eric, > Yes, it may hurt the performance on apps, in case of small buffers and > if blitter engine is busy as there is a synchronous wait for rendering > in the gem_fault handler. If that is the case, we can drop this from the > gem_fault routine and employ it only in the do_execbuffer routine. Its > useful there because there is no synchronous wait required in sw, due > to cross ring synchronization. > We'll gather the numbers to quantify the performance benefit we have > while using blitter engines in this way for different buffer sizes. > >> More importantly, though, it breaks existing userspace that relies on >> buffers being idle on allocation, for the unsychronized maps used in >> intel_bufferobj_subdata() and >> intel_bufferobj_map_range(GL_INVALIDATE_BUFFER_BIT | >> GL_UNSYNCHRONIZED_BIT) > > Sorry, I miss your point here. It may not break this assumption due to > the fact that we employ this method only in case of the preallocate > routine, which will be called in the first page fault of the object > (gem_fault handler) resulting in fresh allocation of pages. > > > So, in case of unsynchronized maps, there may be a wait involved in the > first page fault. Also, that wait time may be lesser than the time > required for CPU memset (resulting in no performance hit). > There won't be any subsequent waits afterwards for that buffer object. > > Though, we'll have performance hit in the case when blitter engine is > already busy and may not be available to immediately start the memset of > freshly allocated mmaped buffers. > > Am I missing something here? Does the userspace requirement for > unsynchronized mapped objects involve complete idleness of object on gpu > even when object page faults for the first time? Oh, I mised how this works. So at pagefault time, you're firing off the blit, then immediately stalling on it? This sounds even less like a possible performance win than I was initially thinking.
Attachment:
pgpxOV15BvzSZ.pgp
Description: PGP signature
_______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx