On Wed, Jun 14, 2017 at 10:32 AM, Dmitry Osipenko <digetx@xxxxxxxxx> wrote: > On 14.06.2017 10:56, Erik Faye-Lund wrote: >> On Wed, Jun 14, 2017 at 1:16 AM, Dmitry Osipenko <digetx@xxxxxxxxx> wrote: >>> The blocking gather copy allocation is a major performance downside of the >>> Host1x firewall, it may take hundreds milliseconds which is unacceptable >>> for the real-time graphics operations. Let's try a non-blocking allocation >>> first as a least invasive solution, it makes opentegra (Xorg driver) >>> performance indistinguishable with/without the firewall. >>> >>> Signed-off-by: Dmitry Osipenko <digetx@xxxxxxxxx> >>> --- >>> >>> The other much more invasive solution could be: to have a memory pool >>> reserved for the gather copy and performing the firewall checks (and the >>> gather copying) on the actual job submission to the CDMA, not on the job >>> allocation like it is done now. This solution reduces the overhead of a >>> gather copy allocations. >>> >>> drivers/gpu/host1x/job.c | 16 ++++++++++++---- >>> 1 file changed, 12 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c >>> index fc194c676d91..ed0575f14093 100644 >>> --- a/drivers/gpu/host1x/job.c >>> +++ b/drivers/gpu/host1x/job.c >>> @@ -594,12 +594,20 @@ static inline int copy_gathers(struct host1x_job *job, struct device *dev) >>> size += g->words * sizeof(u32); >>> } >>> >>> + /* >>> + * Try a non-blocking allocation from a higher priority pools first, >>> + * as awaiting for the allocation here is a major performance hit. >>> + */ >>> job->gather_copy_mapped = dma_alloc_wc(dev, size, &job->gather_copy, >>> - GFP_KERNEL); >>> - if (!job->gather_copy_mapped) { >>> - job->gather_copy_mapped = NULL; >>> + GFP_NOWAIT); >>> + >>> + /* the higher priority allocation failed, try the generic-blocking */ >>> + if (!job->gather_copy_mapped) >>> + job->gather_copy_mapped = dma_alloc_wc(dev, size, >>> + &job->gather_copy, >>> + GFP_KERNEL); >>> + if (!job->gather_copy_mapped) >>> return -ENOMEM; >>> - } >>> >>> job->gather_copy_size = size; >> >> Can't we just have a static fixed-size buffer, and validate chunks at >> the time? Or why can't we validate directly on the mapped pointers? >> >> It feels pretty silly to have to allocate memory in the first place here... >> > > We can't validate the mapped pointers because userspace may write to the buffers > memory at any time. Maybe it should be possible to write-protect cmdbuffers > memory for the time of its submission and execution, but I'm not sure whether > it's worth the effort. The current-proposed solution is efficient and least > invasive. Right, good point. Reviewed-by: Erik Faye-Lund <kusmabite@xxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-tegra" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html