On 23/06/17 07:49 PM, Marek Olšák wrote: > On Fri, Jun 23, 2017 at 11:27 AM, Christian König > <deathsimple at vodafone.de> wrote: >> Am 23.06.2017 um 11:08 schrieb zhoucm1: >>> On 2017å¹´06æ??23æ?¥ 17:01, zhoucm1 wrote: >>>> On 2017å¹´06æ??23æ?¥ 16:25, Christian König wrote: >>>>> Am 23.06.2017 um 09:09 schrieb zhoucm1: >>>>>> On 2017å¹´06æ??23æ?¥ 14:57, Christian König wrote: >>>>>>> >>>>>>> But giving the CS IOCTL an option for directly specifying the BOs >>>>>>> instead of a BO list like Marek suggested would indeed save us some time >>>>>>> here. >>>>>> >>>>>> interesting, I always follow how to improve our cs ioctl, since UMD >>>>>> guys aften complain our command submission is slower than windows. >>>>>> Then how to directly specifying the BOs instead of a BO list? BO handle >>>>>> array from UMD? Could your guys describe more clear? Is it doable? >>>>> >>>>> >>>>> Making the BO list part of the CS IOCTL wouldn't help at all for the >>>>> close source UMDs. To be precise we actually came up with the BO list >>>>> approach because of their requirement. >>>>> >>>>> The biggest bunch of work during CS is reserving all the buffers, >>>>> validating them and checking their VM status. >>>> >>>> Totally agree. Every time when I read code there, I often want to >>>> optimize them. >>>> >>>>> It doesn't matter if the BOs come from the BO list or directly in the CS >>>>> IOCTL. >>>>> >>>>> The key point is that CS overhead is pretty much irrelevant for the open >>>>> source stack, since Mesa does command submission from a separate thread >>>>> anyway. >>>> >>>> If irrelevant for the open stack, then how does open source stack handle >>>> "The biggest bunch of work during CS is reserving all the buffers, >>>> validating them and checking their VM status."? >> >> >> Command submission on the open stack is outsourced to a separate user space >> thread. E.g. when an application triggers a flush the IBs created so far are >> just put on a queue and another thread pushes them down to the kernel. >> >> I mean reducing the overhead of the CS IOCTL is always nice, but you usual >> won't see any fps increase as long as not all CPUs are completely bound to >> some tasks. >> >>>> If open stack has a better way, I think closed stack can follow it, I >>>> don't know the history. >>> >>> Do you not use bo list at all in mesa? radv as well? >> >> >> I don't think so. Mesa just wants to send the list of used BOs down to the >> kernel with every IOCTL. > > The CS ioctl actually costs us some performance, but not as much as on > closed source drivers. > > MesaGL always executes all CS ioctls in a separate thread (in parallel > with the UMD) except for the last IB that's submitted by SwapBuffers. ... or by an explicit glFinish or glFlush (at least when the current draw buffer isn't a back buffer) call, right? > For us, it's certainly useful to optimize the CS ioctl because of apps > that submit only 1 IB per frame where multithreading has no effect or > may even hurt performance. Another possibility might be flushing earlier, e.g. when the GPU and/or CS submission thread are idle. But optimizing the CS ioctl would still help in that case. Finding good heuristics which allows better utilization of the GPU / CS submission thread and doesn't hurt performance in any scenario might be tricky though. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer