On Tue, Jan 20, 2015 at 12:53:35PM -0800, Kristian Høgsberg wrote: > On Tue, Jan 20, 2015 at 12:42 AM, Daniel Vetter <daniel@xxxxxxxx> wrote: > > My idea for all this would have been to create a per-thread execbuf > > relocation context with a hashtab to map buffer pointers to execbuf index > > and a bunch of arrays to prepare the reloc entry tables. If you do it > > correctly all the per-reloc work should be a O(1) streaming writes to a > > few arrays plus the hashtab lookup. With no code run at execbuf time > > (except the ioctl ofc). Even the libdrm_bo->presumed_offset update after > > execbuf could be done lockless (as long as readers are careful to never > > reload it by using something similar to the kernel's READ_ONCE macro). > > > > But that means a completely new reloc api, so a lot more work. Also I > > think it only makes sense do that for drivers that really care about the > > last bit of performance, and then do it within the driver so that there's > > no constraints about abi. > > Indeed, I moved it into mesa so I could rework that. bo_emit_reloc() > is showing up in profiles. The patch below along with NO_RELOC and > HANDLE_LUT flags gives me 5-10% improement on CPU bound benchmarks, so > it's certainly worth it. I'm skeptical that a hashtable lookup per > reloc emit is going to perform better than just fixing up the relocs > at execbuf2 time though. It would be nice to not do any work at ioctl > time, but for that you need a very fast way to map from bo to > per-thread bo state as you go. Maybe a per-thread array mapping from > gem handle to exec_object could work... > > WIP Patch is here: > > http://cgit.freedesktop.org/~krh/mesa/commit/?h=b0e4ce7bbce2a79ad37d6de460af88b9581ea1d7 Hmm, that is actually pretty neat. My idle thought was to create per-context batchmgr with its own view of the bo (to counter the multithreaded free-for-all). In your patch, you neatly demonstrate that you don't need per-context view of the bo, only of the relocations. And it will make drm_intel_bo_emit_reloc() fixed cost, which should produce most of your CPU overhead saving. However, I think if you do take it a step further with a batchmgr_bo, you can make the drm_intel_bo_references() very cheap as well. Looks good. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx