On Wed, Dec 14, 2016 at 03:01:56PM +0100, Lucas Stach wrote: > Am Mittwoch, den 14.12.2016, 14:48 +0100 schrieb Thierry Reding: > > On Wed, Dec 14, 2016 at 12:35:31PM +0100, Lucas Stach wrote: > > > Am Mittwoch, den 14.12.2016, 13:16 +0200 schrieb Mikko Perttunen: > > > > Add a new IO virtual memory allocation API to allow clients to > > > > allocate non-GEM memory in the Tegra DRM IOMMU domain. This is > > > > required e.g. for loading client firmware when clients are attached > > > > to the IOMMU domain. > > > > > > > > The allocator allocates contiguous physical pages that are then > > > > mapped contiguously to the IOMMU domain using the iova_domain > > > > library provided by the kernel. Contiguous physical pages are > > > > used so that the same allocator works also when IOMMU support > > > > is disabled and therefore devices access physical memory directly. > > > > > > > Why is this needed? If you use the DMA API for those buffers you should > > > end up with CMA memory in the !IOMMU case and normal paged memory with > > > IOMMU enabled. From my understanding this should match the requirements. > > > > We can't currently use the DMA API for these allocations because it > > doesn't allow (or at least didn't back when this was first implemented) > > us to share a mapping between two devices. > > Hm, maybe I'm overlooking something, but isn't this just a matter of > allocating on one device, then constructing a SG list (dma_get_sgtable) > from the buffer you got and use that to dma_map_sg() the buffer on the > other device? Yes, that would work. What I was referring to is sharing framebuffers between multiple CRTCs. Back at the time when IOMMU support was first added, I tried to use the DMA API. However, the problem with that was that we would've had to effectively dma_map_sg() on every page-flip since the buffer is imported into the DRM device, but there's no call that would import it for each CRTC only once. So when the framebuffer is added to a plane, you have to map it to the corresponding display controller. And the dma_map_sg() was, if I recall correctly, on the order of 5-10 ms, which is prohibitively expensive to do per frame. It's also completely unnecessary because all CRTCs in a DRM device can simply share the same IOMMU domain. I can't think of a reason why you would want or need to use separate domains. Back at the time this was something that the DMA API couldn't do, it would simply assign a separate IOMMU domain per device. It's possible that this has changed now given that many others must've run into the same problem meanwhile. > Maybe doing the firmware buffer allocation on host1x (with a 4GB upper > bound) and then sharing the SG list to the devices? That's pretty much what this API is doing. Only it's the other way around: we don't share the SG list with other devices for mapping, we simply reuse the same mapping across multiple devices, since they're all in the same IOMMU domain. > > The reason why we need these patches is that when IOMMU is enabled, then > > the units' falcons will read memory through the IOMMU, so we must have > > allocations for GEM buffers and the firmware go through the same > > mechanism. > > Sorry for maybe dumb questions. > > Do you have any engines other than the GPU that can handle SG > themselves? No, I don't think so. > Wouldn't you want the GEM objects to be backed by CMA in the !MMU > case? That's exactly what's happening already. If no IOMMU is available we allocate buffer objects backing store with dma_alloc_wc(). > How are ordinary GEM objects different from the falcon firmware? They're not. I think we could probably reuse more of the BO allocation functions for the firmware as well. I think Mikko already agreed to look into that. We might have to add some special cases, or split up the helpers a little differently to avoid creating GEM objects from the firmware buffers. We wouldn't want userspace to start mmap()'ing those. Thierry
Attachment:
signature.asc
Description: PGP signature