On Thu, Nov 10, 2016 at 08:23:38PM +0200, Mikko Perttunen wrote: > Add a new IO virtual memory allocation API to allow clients to > allocate non-GEM memory in the Tegra DRM IOMMU domain. This is > required e.g. for loading client firmware when clients are attached > to the IOMMU domain. > > The allocator allocates contiguous physical pages that are then > mapped contiguously to the IOMMU domain using the iova_domain > library provided by the kernel. Contiguous physical pages are > used so that the same allocator works also when IOMMU support > is disabled and therefore devices access physical memory directly. > > Signed-off-by: Mikko Perttunen <mperttunen@xxxxxxxxxx> > --- > drivers/gpu/drm/tegra/drm.c | 98 ++++++++++++++++++++++++++++++++++++++++++--- > drivers/gpu/drm/tegra/drm.h | 7 ++++ > 2 files changed, 100 insertions(+), 5 deletions(-) This looks mostly good, just a few comments below... > diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c > index b8be3ee..ecfe7ba 100644 > --- a/drivers/gpu/drm/tegra/drm.c > +++ b/drivers/gpu/drm/tegra/drm.c > @@ -1,12 +1,13 @@ > /* > * Copyright (C) 2012 Avionic Design GmbH > - * Copyright (C) 2012-2013 NVIDIA CORPORATION. All rights reserved. > + * Copyright (C) 2012-2015 NVIDIA CORPORATION. All rights reserved. It's almost 2017 now, I think this is out of date. =) > * > * This program is free software; you can redistribute it and/or modify > * it under the terms of the GNU General Public License version 2 as > * published by the Free Software Foundation. > */ > > +#include <linux/bitops.h> > #include <linux/host1x.h> > #include <linux/iommu.h> > > @@ -23,6 +24,8 @@ > #define DRIVER_MINOR 0 > #define DRIVER_PATCHLEVEL 0 > > +#define IOVA_AREA_SZ (1024 * 1024 * 64) /* 64 MiB */ SZ_64M? > + > struct tegra_drm_file { > struct list_head contexts; > }; > @@ -127,7 +130,8 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags) > > if (iommu_present(&platform_bus_type)) { > struct iommu_domain_geometry *geometry; > - u64 start, end; > + unsigned long order; > + u64 iova_start, start, end; Can we be a little more explicit and keep an additional iova_end to simplify the code a little? > tegra->domain = iommu_domain_alloc(&platform_bus_type); > if (!tegra->domain) { > @@ -138,10 +142,17 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags) > geometry = &tegra->domain->geometry; > start = geometry->aperture_start; > end = geometry->aperture_end; > + iova_start = end - IOVA_AREA_SZ + 1; > + > + order = __ffs(tegra->domain->pgsize_bitmap); > + init_iova_domain(&tegra->iova, 1UL << order, > + iova_start >> order, > + end >> order); I hadn't seen this IOVA domain thing and it looks rather handy. > > - DRM_DEBUG_DRIVER("IOMMU aperture initialized (%#llx-%#llx)\n", > - start, end); > - drm_mm_init(&tegra->mm, start, end - start + 1); > + drm_mm_init(&tegra->mm, start, iova_start - start); > + > + DRM_DEBUG("IOMMU apertures initialized (GEM: %#llx-%#llx, IOVA: %#llx-%#llx)\n", > + start, iova_start-1, iova_start, end); Might be worth splitting this into two, or perhaps even three lines: IOMMU apertures: GEM: %#llx-%#llx IOVA: %#llx-%#llx Maybe also find a better name for IOVA, since GEM will be using I/O virtual addresses as well. Perhaps just name it "carveout" or something like that? > } > > mutex_init(&tegra->clients_lock); > @@ -208,6 +219,7 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags) > if (tegra->domain) { > iommu_domain_free(tegra->domain); > drm_mm_takedown(&tegra->mm); > + put_iova_domain(&tegra->iova); > } > free: > kfree(tegra); > @@ -232,6 +244,7 @@ static int tegra_drm_unload(struct drm_device *drm) > if (tegra->domain) { > iommu_domain_free(tegra->domain); > drm_mm_takedown(&tegra->mm); > + put_iova_domain(&tegra->iova); > } > > kfree(tegra); > @@ -975,6 +988,81 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra, > return 0; > } > > +void *tegra_drm_alloc(struct tegra_drm *tegra, size_t size, > + dma_addr_t *iova) Maybe name the iova parameter phys? iova implies that it's always translated, whereas according to the code it can be physical, too. > +{ > + struct iova *alloc; > + unsigned long shift; > + void *virt; > + gfp_t gfp; > + int err; > + > + if (tegra->domain) > + size = iova_align(&tegra->iova, size); > + else > + size = PAGE_ALIGN(size); > + > + gfp = GFP_KERNEL | __GFP_ZERO; > + if (!tegra->domain) { > + /* > + * Tegra210 has 64-bit physical addresses but units only support > + * 32-bit addresses, so if we don't have an IOMMU to do > + * translation, force allocations to be in the 32-bit range. > + */ > + gfp |= GFP_DMA; Technically we do support Tegra132 and that already has 64-bit physical addresses, so perhaps include that in the comment, or make it more generic: /* * Many units only support 32-bit addresses, even on 64-bit SoCs. * If there is no IOMMU to translate into a 32-bit IO virtual address * address space, force allocations to be in the lower 32-bit range. */ > + } > + > + virt = (void *)__get_free_pages(gfp, get_order(size)); > + if (!virt) > + return NULL; Perhaps we want to return an ERR_PTR()-encoded error code here so we can differentiate between out-of-memory and out-of-virtual-address-space conditions in the caller? Or perhaps other conditions as well. Also, is __get_free_pages() the right API here? I thought that it always returned contiguous memory, which, in the presence of an IOMMU, will be completely unnecessary. > + > + if (!tegra->domain) { > + /* > + * If IOMMU is disabled, devices address physical memory > + * directly. > + */ > + *iova = virt_to_phys(virt); > + return virt; > + } > + > + shift = iova_shift(&tegra->iova); > + alloc = alloc_iova(&tegra->iova, size >> shift, > + tegra->domain->geometry.aperture_end >> shift, true); This is fairly cumbersome to use. Maybe a drm_mm would be easier in this case? If not, could we at least store the upper limit of the carveout so that we don't have to use the really long expression above? Thierry
Attachment:
signature.asc
Description: PGP signature