On Wed, Mar 24, 2021 at 07:50:01PM +0300, Dmitry Osipenko wrote: > 24.03.2021 19:42, Thierry Reding пишет: > > On Wed, Mar 24, 2021 at 06:45:30PM +0300, Dmitry Osipenko wrote: > >> 24.03.2021 18:02, Thierry Reding пишет: > >>> On Wed, Mar 24, 2021 at 05:41:08PM +0300, Dmitry Osipenko wrote: > >>>> 23.03.2021 18:54, Thierry Reding пишет: > >>>>> From: Thierry Reding <treding@xxxxxxxxxx> > >>>>> > >>>>> Clarify when a fixed IOV address can be used and when a buffer has to > >>>>> be mapped before the IOVA can be used. > >>>>> > >>>>> Signed-off-by: Thierry Reding <treding@xxxxxxxxxx> > >>>>> --- > >>>>> drivers/gpu/drm/tegra/plane.c | 8 ++++++++ > >>>>> 1 file changed, 8 insertions(+) > >>>>> > >>>>> diff --git a/drivers/gpu/drm/tegra/plane.c b/drivers/gpu/drm/tegra/plane.c > >>>>> index 19e8847a164b..793da5d675d2 100644 > >>>>> --- a/drivers/gpu/drm/tegra/plane.c > >>>>> +++ b/drivers/gpu/drm/tegra/plane.c > >>>>> @@ -119,6 +119,14 @@ static int tegra_dc_pin(struct tegra_dc *dc, struct tegra_plane_state *state) > >>>>> dma_addr_t phys_addr, *phys; > >>>>> struct sg_table *sgt; > >>>>> > >>>>> + /* > >>>>> + * If we're not attached to a domain, we already stored the > >>>>> + * physical address when the buffer was allocated. If we're > >>>>> + * part of a group that's shared between all display > >>>>> + * controllers, we've also already mapped the framebuffer > >>>>> + * through the SMMU. In both cases we can short-circuit the > >>>>> + * code below and retrieve the stored IOV address. > >>>>> + */ > >>>>> if (!domain || dc->client.group) > >>>>> phys = &phys_addr; > >>>>> else > >>>>> > >>>> > >>>> This comment is correct, but the logic feels a bit lame because it > >>>> should be wasteful to re-map DMA on each FB flip. Personally I don't > >>>> care much about this since older Tegras use pinned buffers by default, > >>>> but this shouldn't be good for T124+ users. > >>> > >>> I'm not terribly thrilled by this either, but it's the only way to do > >>> this when using the DMA API because we don't know at allocation time (or > >>> import time for that matter) which of the (up to) 4 display controllers > >>> a framebuffer will be shown on. tegra_dc_pin() is the earliest where > >>> this is known and worst case that's called once per flip. > >>> > >>> When the IOMMU API is used explicitly, we always map framebuffers into > >>> the IOMMU domain shared by all display controllers at allocation or > >>> import time and then we don't need to pin at flip time anymore. > >>> > >>> I do have a work-in-progress patch somewhere that creates a mapping > >>> cache to mitigate this problem to some degree. I need to dig that up and > >>> do a few measurements because I vaguely recall this speeding up flips by > >>> quite a bit (well, except for the very first mapping, obviously). > >>> > >>>> Perhaps dumb buffers should be pinned to display by default and then we > >>>> should extend the Tegra UAPI to support BO mapping to display client(?). > >>> > >>> That would kind of defeat the purpose of a generic KMS UAPI. > >> > >> Couldn't the BOs be mapped when FB is created, i.e. by tegra_fb_create? > > > > I suppose that would be possible. However, tegra_fb_create() doesn't > > know a thing about display controllers, so we'd have to add extra code > > to it to iterate over all display controllers and do a dma_map_sg() of > > the GEM object for each of them. > > > > It's also somewhat wasteful because now we get a mapping for each > > framebuffer for each display controller. So if you've got, say, a four > > UHD screen setup (which is something that Tegra194 supports), you could > > end up with 8 UHD framebuffers (two for each display, for double- > > buffering) at 32 MiB each for a whopping 256 MiB of memory that needs to > > be mapped for each of the four display controllers. That 1 GiB worth of > > page table updates, whereas you really only need one fourth of that. > > > > Granted, this will make flipping a bit faster, and IOVA space isn't > > really a problem on Tegra194. It would still waste a bit of RAM for all > > those page table entries that we don't really need, though. > > > > A mapping cache seems like a much better compromise because the cache > > lookup should be quite fast compared to a mapping operation and we waste > > just a couple dozen bytes per mapping perhaps as opposed to a few > > megabytes for the gratuitous, preemptive mappings. > > Isn't it really possible to put displays into the same IOMMU group on > T194? It doesn't make much sense to have them in a separate groups on Linux. It is possible and in fact that's what's already happening. However, the problem isn't that these devices are not in the same group, the problem is that the DMA API doesn't know anything about groups. It works on struct device and if you've got DMA API debugging enabled it may even flag incorrect usage as errors. So from a DMA API point of view, if a device wants to use a buffer, that buffer first has to be mapped for that device, even if it was already mapped for a different device that happens to be in the same IOMMU group and hence share an IOMMU domain. Thierry
Attachment:
signature.asc
Description: PGP signature