Re: [PATCH v2 22/22] gpu: host1x: At first try a non-blocking allocation for the gather copy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 14.06.2017 10:56, Erik Faye-Lund wrote:
> On Wed, Jun 14, 2017 at 1:16 AM, Dmitry Osipenko <digetx@xxxxxxxxx> wrote:
>> The blocking gather copy allocation is a major performance downside of the
>> Host1x firewall, it may take hundreds milliseconds which is unacceptable
>> for the real-time graphics operations. Let's try a non-blocking allocation
>> first as a least invasive solution, it makes opentegra (Xorg driver)
>> performance indistinguishable with/without the firewall.
>>
>> Signed-off-by: Dmitry Osipenko <digetx@xxxxxxxxx>
>> ---
>>
>> The other much more invasive solution could be: to have a memory pool
>> reserved for the gather copy and performing the firewall checks (and the
>> gather copying) on the actual job submission to the CDMA, not on the job
>> allocation like it is done now. This solution reduces the overhead of a
>> gather copy allocations.
>>
>>  drivers/gpu/host1x/job.c | 16 ++++++++++++----
>>  1 file changed, 12 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
>> index fc194c676d91..ed0575f14093 100644
>> --- a/drivers/gpu/host1x/job.c
>> +++ b/drivers/gpu/host1x/job.c
>> @@ -594,12 +594,20 @@ static inline int copy_gathers(struct host1x_job *job, struct device *dev)
>>                 size += g->words * sizeof(u32);
>>         }
>>
>> +       /*
>> +        * Try a non-blocking allocation from a higher priority pools first,
>> +        * as awaiting for the allocation here is a major performance hit.
>> +        */
>>         job->gather_copy_mapped = dma_alloc_wc(dev, size, &job->gather_copy,
>> -                                              GFP_KERNEL);
>> -       if (!job->gather_copy_mapped) {
>> -               job->gather_copy_mapped = NULL;
>> +                                              GFP_NOWAIT);
>> +
>> +       /* the higher priority allocation failed, try the generic-blocking */
>> +       if (!job->gather_copy_mapped)
>> +               job->gather_copy_mapped = dma_alloc_wc(dev, size,
>> +                                                      &job->gather_copy,
>> +                                                      GFP_KERNEL);
>> +       if (!job->gather_copy_mapped)
>>                 return -ENOMEM;
>> -       }
>>
>>         job->gather_copy_size = size;
> 
> Can't we just have a static fixed-size buffer, and validate chunks at
> the time? Or why can't we validate directly on the mapped pointers?
> 
> It feels pretty silly to have to allocate memory in the first place here...
> 

We can't validate the mapped pointers because userspace may write to the buffers
memory at any time. Maybe it should be possible to write-protect cmdbuffers
memory for the time of its submission and execution, but I'm not sure whether
it's worth the effort. The current-proposed solution is efficient and least
invasive.

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [ARM Kernel]     [Linux ARM]     [Linux ARM MSM]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux