On Fri, Jun 23, 2017 at 11:27 AM, Christian König <deathsimple at vodafone.de> wrote: > Am 23.06.2017 um 11:08 schrieb zhoucm1: >> >> >> >> On 2017å¹´06æ??23æ?¥ 17:01, zhoucm1 wrote: >>> >>> >>> >>> On 2017å¹´06æ??23æ?¥ 16:25, Christian König wrote: >>>> >>>> Am 23.06.2017 um 09:09 schrieb zhoucm1: >>>>> >>>>> >>>>> >>>>> On 2017å¹´06æ??23æ?¥ 14:57, Christian König wrote: >>>>>> >>>>>> But giving the CS IOCTL an option for directly specifying the BOs >>>>>> instead of a BO list like Marek suggested would indeed save us some time >>>>>> here. >>>>> >>>>> interesting, I always follow how to improve our cs ioctl, since UMD >>>>> guys aften complain our command submission is slower than windows. >>>>> Then how to directly specifying the BOs instead of a BO list? BO handle >>>>> array from UMD? Could your guys describe more clear? Is it doable? >>>> >>>> >>>> Making the BO list part of the CS IOCTL wouldn't help at all for the >>>> close source UMDs. To be precise we actually came up with the BO list >>>> approach because of their requirement. >>>> >>>> The biggest bunch of work during CS is reserving all the buffers, >>>> validating them and checking their VM status. >>> >>> Totally agree. Every time when I read code there, I often want to >>> optimize them. >>> >>>> It doesn't matter if the BOs come from the BO list or directly in the CS >>>> IOCTL. >>>> >>>> The key point is that CS overhead is pretty much irrelevant for the open >>>> source stack, since Mesa does command submission from a separate thread >>>> anyway. >>> >>> If irrelevant for the open stack, then how does open source stack handle >>> "The biggest bunch of work during CS is reserving all the buffers, >>> validating them and checking their VM status."? > > > Command submission on the open stack is outsourced to a separate user space > thread. E.g. when an application triggers a flush the IBs created so far are > just put on a queue and another thread pushes them down to the kernel. > > I mean reducing the overhead of the CS IOCTL is always nice, but you usual > won't see any fps increase as long as not all CPUs are completely bound to > some tasks. > >>> If open stack has a better way, I think closed stack can follow it, I >>> don't know the history. >> >> Do you not use bo list at all in mesa? radv as well? > > > I don't think so. Mesa just wants to send the list of used BOs down to the > kernel with every IOCTL. The CS ioctl actually costs us some performance, but not as much as on closed source drivers. MesaGL always executes all CS ioctls in a separate thread (in parallel with the UMD) except for the last IB that's submitted by SwapBuffers. SwapBuffers requires that all IBs have been submitted when SwapBuffers returns. For example, if you have 5 IBs per frame, 4 of them are executed on the thread and the overhead is hidden. The last one is executed on the thread too, but this time the Mesa driver has to wait for it. For things like glxgears with only 1 IB per frame, the thread doesn't hide anything and Mesa always has to wait for it after submission, just because of SwapBuffers. Having 10 or more IBs per frame is great, because 9 are done in parallel and the last one is synchronous. The final CPU cost is 10x lower, but it's not zero. For us, it's certainly useful to optimize the CS ioctl because of apps that submit only 1 IB per frame where multithreading has no effect or may even hurt performance. The most obvious inefficiency is the BO_LIST ioctl that is completely unnecessary and only slows us down. What we need is exactly what radeon does. Marek