[PATCH 1/3] drm/amdgpu: fix a typo

maraeo@xxxxxxxxx (Marek Olšák) · Fri, 23 Jun 2017 12:49:14 +0200

On Fri, Jun 23, 2017 at 11:27 AM, Christian KÃ¶nig
<deathsimple at vodafone.de> wrote:
> Am 23.06.2017 um 11:08 schrieb zhoucm1:
>>
>>
>>
>> On 2017å¹´06æ??23æ?¥ 17:01, zhoucm1 wrote:
>>>
>>>
>>>
>>> On 2017å¹´06æ??23æ?¥ 16:25, Christian KÃ¶nig wrote:
>>>>
>>>> Am 23.06.2017 um 09:09 schrieb zhoucm1:
>>>>>
>>>>>
>>>>>
>>>>> On 2017å¹´06æ??23æ?¥ 14:57, Christian KÃ¶nig wrote:
>>>>>>
>>>>>> But giving the CS IOCTL an option for directly specifying the BOs
>>>>>> instead of a BO list like Marek suggested would indeed save us some time
>>>>>> here.
>>>>>
>>>>> interesting, I always follow how to improve our cs ioctl, since UMD
>>>>> guys aften complain our command submission is slower than windows.
>>>>> Then how to directly specifying the BOs instead of a BO list? BO handle
>>>>> array from UMD? Could your guys describe more clear? Is it doable?
>>>>
>>>>
>>>> Making the BO list part of the CS IOCTL wouldn't help at all for the
>>>> close source UMDs. To be precise we actually came up with the BO list
>>>> approach because of their requirement.
>>>>
>>>> The biggest bunch of work during CS is reserving all the buffers,
>>>> validating them and checking their VM status.
>>>
>>> Totally agree. Every time when I read code there, I often want to
>>> optimize them.
>>>
>>>> It doesn't matter if the BOs come from the BO list or directly in the CS
>>>> IOCTL.
>>>>
>>>> The key point is that CS overhead is pretty much irrelevant for the open
>>>> source stack, since Mesa does command submission from a separate thread
>>>> anyway.
>>>
>>> If irrelevant for the open stack, then how does open source stack handle
>>> "The biggest bunch of work during CS is reserving all the buffers,
>>> validating them and checking their VM status."?
>
>
> Command submission on the open stack is outsourced to a separate user space
> thread. E.g. when an application triggers a flush the IBs created so far are
> just put on a queue and another thread pushes them down to the kernel.
>
> I mean reducing the overhead of the CS IOCTL is always nice, but you usual
> won't see any fps increase as long as not all CPUs are completely bound to
> some tasks.
>
>>> If open stack has a better way, I think closed stack can follow it, I
>>> don't know the history.
>>
>> Do you not use bo list at all in mesa? radv as well?
>
>
> I don't think so. Mesa just wants to send the list of used BOs down to the
> kernel with every IOCTL.

The CS ioctl actually costs us some performance, but not as much as on
closed source drivers.

MesaGL always executes all CS ioctls in a separate thread (in parallel
with the UMD) except for the last IB that's submitted by SwapBuffers.
SwapBuffers requires that all IBs have been submitted when SwapBuffers
returns. For example, if you have 5 IBs per frame, 4 of them are
executed on the thread and the overhead is hidden. The last one is
executed on the thread too, but this time the Mesa driver has to wait
for it. For things like glxgears with only 1 IB per frame, the thread
doesn't hide anything and Mesa always has to wait for it after
submission, just because of SwapBuffers.

Having 10 or more IBs per frame is great, because 9 are done in
parallel and the last one is synchronous. The final CPU cost is 10x
lower, but it's not zero.

For us, it's certainly useful to optimize the CS ioctl because of apps
that submit only 1 IB per frame where multithreading has no effect or
may even hurt performance.

The most obvious inefficiency is the BO_LIST ioctl that is completely
unnecessary and only slows us down. What we need is exactly what
radeon does.

Marek