On Fri, Jan 08, 2021 at 11:17:25AM +0530, Sai Prakash Ranjan wrote: > On 2021-01-07 22:27, isaacm@xxxxxxxxxxxxxx wrote: > > On 2021-01-06 03:56, Will Deacon wrote: > > > On Thu, Dec 24, 2020 at 12:10:07PM +0530, Sai Prakash Ranjan wrote: > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY > > > > flag") > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went > > > > the memory type setting required for the non-coherent masters to use > > > > system cache. Now that system cache support for GPU is added, we will > > > > need to mark the memory as normal sys-cached for GPU to use > > > > system cache. > > > > Without this, the system cache lines are not allocated for GPU. > > > > We use > > > > the IO_PGTABLE_QUIRK_ARM_OUTER_WBWA quirk instead of a page > > > > protection > > > > flag as the flag cannot be exposed via DMA api because of no in-tree > > > > users. > > > > > > > > Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@xxxxxxxxxxxxxx> > > > > --- > > > > drivers/iommu/io-pgtable-arm.c | 3 +++ > > > > 1 file changed, 3 insertions(+) > > > > > > > > diff --git a/drivers/iommu/io-pgtable-arm.c > > > > b/drivers/iommu/io-pgtable-arm.c > > > > index 7c9ea9d7874a..3fb7de8304a2 100644 > > > > --- a/drivers/iommu/io-pgtable-arm.c > > > > +++ b/drivers/iommu/io-pgtable-arm.c > > > > @@ -415,6 +415,9 @@ static arm_lpae_iopte > > > > arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, > > > > else if (prot & IOMMU_CACHE) > > > > pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE > > > > << ARM_LPAE_PTE_ATTRINDX_SHIFT); > > > > + else if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA) > > > > + pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE > > > > + << ARM_LPAE_PTE_ATTRINDX_SHIFT); > > > > } > > > > > While this approach of enabling system cache globally for both page > > tables and other buffers > > works for the GPU usecase, this isn't ideal for other clients that use > > system cache. For example, > > video clients only want to cache a subset of their buffers in the > > system cache, due to the sizing constraint > > imposed by how much of the system cache they can use. So, it would be > > ideal to have > > a way of expressing the desire to use the system cache on a per-buffer > > basis. Additionally, > > our video clients use the DMA layer, and since the requirement is for > > caching in the system cache > > to be a per buffer attribute, it seems like we would have to have a > > DMA attribute to express > > this on a per-buffer basis. > > > > I did bring this up initially [1], also where is this video client > in upstream? AFAIK, only system cache user in upstream is GPU. > We cannot add any DMA attribute unless there is any user upstream > as per [2], so when the support for such a client is added, wouldn't > ((data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA) || PROT_FLAG) > work? Hmm, I think this is another case where we need to separate out the page-table walker attributes from the access attributes. Currently, IO_PGTABLE_QUIRK_ARM_OUTER_WBWA applies _only_ to the page-table walker and I don't think it makes any sense for that to be per-buffer (how would you even manage that?). However, if we want to extend this to data accesses and we know that there are valid use-cases where this should be per-buffer, then shoe-horning it in with the walker quirk does not feel like the best thing to do. As a starting point, we could: 1. Rename IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC 2. Add a new prot flag IOMMU_LLC 3. Have the GPU pass the new prot for its buffer mappings Does that work? One thing I'm not sure about is whether IOMMU_CACHE should imply IOMMU_LLC, or whether there is a use-case for inner-cacheable, outer non-cacheable mappings for a coherent device. Have you ever seen that sort of thing before? Will