On Tue, 2014-05-06 at 17:56 +0000, Eric Anholt wrote: > sourab.gupta@xxxxxxxxx writes: > > > From: Sourab Gupta <sourab.gupta@xxxxxxxxx> > > > > This patch is in continuation of and is dependent on earlier patch > > series to 'reduce the time for which device mutex is kept locked'. > > (http://lists.freedesktop.org/archives/intel-gfx/2014-May/044596.html) > > One of userspace's assumptions is that when you allocate a new BO, you > can map it and start writing data into it without needing to wait on the > GPU. I expect this patch to mostly hurt performance on apps (and I note > that the patch doesn't come with any actual performance data) that get > more stalls as a result. > Hi Eric, Yes, it may hurt the performance on apps, in case of small buffers and if blitter engine is busy as there is a synchronous wait for rendering in the gem_fault handler. If that is the case, we can drop this from the gem_fault routine and employ it only in the do_execbuffer routine. Its useful there because there is no synchronous wait required in sw, due to cross ring synchronization. We'll gather the numbers to quantify the performance benefit we have while using blitter engines in this way for different buffer sizes. > More importantly, though, it breaks existing userspace that relies on > buffers being idle on allocation, for the unsychronized maps used in > intel_bufferobj_subdata() and > intel_bufferobj_map_range(GL_INVALIDATE_BUFFER_BIT | > GL_UNSYNCHRONIZED_BIT) Sorry, I miss your point here. It may not break this assumption due to the fact that we employ this method only in case of the preallocate routine, which will be called in the first page fault of the object (gem_fault handler) resulting in fresh allocation of pages. So, in case of unsynchronized maps, there may be a wait involved in the first page fault. Also, that wait time may be lesser than the time required for CPU memset (resulting in no performance hit). There won't be any subsequent waits afterwards for that buffer object. Though, we'll have performance hit in the case when blitter engine is already busy and may not be available to immediately start the memset of freshly allocated mmaped buffers. Am I missing something here? Does the userspace requirement for unsynchronized mapped objects involve complete idleness of object on gpu even when object page faults for the first time? Regards, Sourab _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx