Re: [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag

Sai Prakash Ranjan <saiprakash.ranjan@xxxxxxxxxxxxxx> · Wed, 30 Jun 2021 15:37:59 +0530

Hi Will,

On 2021-03-25 23:03, Will Deacon wrote:
On Tue, Mar 09, 2021 at 12:10:44PM +0530, Sai Prakash Ranjan wrote:
On 2021-02-05 17:38, Sai Prakash Ranjan wrote:
> On 2021-02-04 03:16, Will Deacon wrote:
> > On Tue, Feb 02, 2021 at 11:56:27AM +0530, Sai Prakash Ranjan wrote:
> > > On 2021-02-01 23:50, Jordan Crouse wrote:
> > > > On Mon, Feb 01, 2021 at 08:20:44AM -0800, Rob Clark wrote:
> > > > > On Mon, Feb 1, 2021 at 3:16 AM Will Deacon <will@xxxxxxxxxx> wrote:
> > > > > > On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan wrote:
> > > > > > > On 2021-01-29 14:35, Will Deacon wrote:
> > > > > > > > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan wrote:
> > > > > > > > > +#define IOMMU_LLC        (1 << 6)
> > > > > > > >
> > > > > > > > On reflection, I'm a bit worried about exposing this because I think it
> > > > > > > > will
> > > > > > > > introduce a mismatched virtual alias with the CPU (we don't even have a
> > > > > > > > MAIR
> > > > > > > > set up for this memory type). Now, we also have that issue for the PTW,
> > > > > > > > but
> > > > > > > > since we always use cache maintenance (i.e. the streaming API) for
> > > > > > > > publishing the page-tables to a non-coheren walker, it works out.
> > > > > > > > However,
> > > > > > > > if somebody expects IOMMU_LLC to be coherent with a DMA API coherent
> > > > > > > > allocation, then they're potentially in for a nasty surprise due to the
> > > > > > > > mismatched outer-cacheability attributes.
> > > > > > > >
> > > > > > >
> > > > > > > Can't we add the syscached memory type similar to what is done on android?
> > > > > >
> > > > > > Maybe. How does the GPU driver map these things on the CPU side?
> > > > >
> > > > > Currently we use writecombine mappings for everything, although there
> > > > > are some cases that we'd like to use cached (but have not merged
> > > > > patches that would give userspace a way to flush/invalidate)
> > > > >
> > > >
> > > > LLC/system cache doesn't have a relationship with the CPU cache.  Its
> > > > just a
> > > > little accelerator that sits on the connection from the GPU to DDR and
> > > > caches
> > > > accesses. The hint that Sai is suggesting is used to mark the buffers as
> > > > 'no-write-allocate' to prevent GPU write operations from being cached in
> > > > the LLC
> > > > which a) isn't interesting and b) takes up cache space for read
> > > > operations.
> > > >
> > > > Its easiest to think of the LLC as a bonus accelerator that has no cost
> > > > for
> > > > us to use outside of the unfortunate per buffer hint.
> > > >
> > > > We do have to worry about the CPU cache w.r.t I/O coherency (which is a
> > > > different hint) and in that case we have all of concerns that Will
> > > > identified.
> > > >
> > >
> > > For mismatched outer cacheability attributes which Will
> > > mentioned, I was
> > > referring to [1] in android kernel.
> >
> > I've lost track of the conversation here :/
> >
> > When the GPU has a buffer mapped with IOMMU_LLC, is the buffer also
> > mapped
> > into the CPU and with what attributes? Rob said "writecombine for
> > everything" -- does that mean ioremap_wc() / MEMREMAP_WC?
> >
>
> Rob answered this.
>
> > Finally, we need to be careful when we use the word "hint" as
> > "allocation
> > hint" has a specific meaning in the architecture, and if we only
> > mismatch on
> > those then we're actually ok. But I think IOMMU_LLC is more than
> > just a
> > hint, since it actually drives eviction policy (i.e. it enables
> > writeback).
> >
> > Sorry for the pedantry, but I just want to make sure we're all talking
> > about the same things!
> >
>
> Sorry for the confusion which probably was caused by my mentioning of
> android, NWA(no write allocate) is an allocation hint which we can
> ignore
> for now as it is not introduced yet in upstream.
>

Any chance of taking this forward? We do not want to miss out on small 
fps
gain when the product gets released.

Do we have a solution to the mismatched virtual alias?

Sorry for the long delay on this thread.

For mismatched virtual alias question, wasn't this already discussed in 
stretch
when initial support for system cache [1] (which was reverted by you) 
was added?

Excerpt from there,

"As seen in downstream kernels there are few non-coherent devices which
would not want to allocate in system cache, and therefore would want
Inner/Outer non-cached memory. So, we may want to either override the
attributes per-device, or as you suggested we may want to introduce
another memory type 'sys-cached' that can be added with its separate
infra."

As for DMA API usage, we do not have any upstream users (video will be
one if they decide to upstream that).

[1] 
https://patchwork.kernel.org/project/linux-arm-msm/patch/20180615105329.26800-1-vivek.gautam@xxxxxxxxxxxxxx/

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation