Re: [RFC PATCH 0/5] drm/panthor: Protected mode support for Mali CSF GPUs

Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> · Tue, 11 Feb 2025 15:32:23 +0100

On Tue, 11 Feb 2025 14:46:56 +0100
Maxime Ripard <mripard@xxxxxxxxxx> wrote:

> Hi Boris,
> 
> On Fri, Feb 07, 2025 at 04:02:53PM +0100, Boris Brezillon wrote:
> > Sorry for joining the party late, a couple of comments to back Akash
> > and Nicolas' concerns.
> > 
> > On Wed, 05 Feb 2025 13:14:14 -0500
> > Nicolas Dufresne <nicolas@xxxxxxxxxxxx> wrote:
> >   
> > > Le mercredi 05 février 2025 à 15:52 +0100, Maxime Ripard a écrit :  
> > > > On Mon, Feb 03, 2025 at 04:43:23PM +0000, Florent Tomasin wrote:    
> > > > > Hi Maxime, Nicolas
> > > > > 
> > > > > On 30/01/2025 17:47, Nicolas Dufresne wrote:    
> > > > > > Le jeudi 30 janvier 2025 à 17:38 +0100, Maxime Ripard a écrit :    
> > > > > > > Hi Nicolas,
> > > > > > > 
> > > > > > > On Thu, Jan 30, 2025 at 10:59:56AM -0500, Nicolas Dufresne wrote:    
> > > > > > > > Le jeudi 30 janvier 2025 à 14:46 +0100, Maxime Ripard a écrit :    
> > > > > > > > > Hi,
> > > > > > > > > 
> > > > > > > > > I started to review it, but it's probably best to discuss it here.
> > > > > > > > > 
> > > > > > > > > On Thu, Jan 30, 2025 at 01:08:56PM +0000, Florent Tomasin wrote:    
> > > > > > > > > > Hi,
> > > > > > > > > > 
> > > > > > > > > > This is a patch series covering the support for protected mode execution in
> > > > > > > > > > Mali Panthor CSF kernel driver.
> > > > > > > > > > 
> > > > > > > > > > The Mali CSF GPUs come with the support for protected mode execution at the
> > > > > > > > > > HW level. This feature requires two main changes in the kernel driver:
> > > > > > > > > > 
> > > > > > > > > > 1) Configure the GPU with a protected buffer. The system must provide a DMA
> > > > > > > > > >    heap from which the driver can allocate a protected buffer.
> > > > > > > > > >    It can be a carved-out memory or dynamically allocated protected memory region.
> > > > > > > > > >    Some system includes a trusted FW which is in charge of the protected memory.
> > > > > > > > > >    Since this problem is integration specific, the Mali Panthor CSF kernel
> > > > > > > > > >    driver must import the protected memory from a device specific exporter.    
> > > > > > > > > 
> > > > > > > > > Why do you need a heap for it in the first place? My understanding of
> > > > > > > > > your series is that you have a carved out memory region somewhere, and
> > > > > > > > > you want to allocate from that carved out memory region your buffers.
> > > > > > > > > 
> > > > > > > > > How is that any different from using a reserved-memory region, adding
> > > > > > > > > the reserved-memory property to the GPU device and doing all your
> > > > > > > > > allocation through the usual dma_alloc_* API?    
> > > > > > > > 
> > > > > > > > How do you then multiplex this region so it can be shared between
> > > > > > > > GPU/Camera/Display/Codec drivers and also userspace ?    
> > > > > > > 
> > > > > > > You could point all the devices to the same reserved memory region, and
> > > > > > > they would all allocate from there, including for their userspace-facing
> > > > > > > allocations.    
> > > > > > 
> > > > > > I get that using memory region is somewhat more of an HW description, and
> > > > > > aligned with what a DT is supposed to describe. One of the challenge is that
> > > > > > Mediatek heap proposal endup calling into their TEE, meaning knowing the region
> > > > > > is not that useful. You actually need the TEE APP guid and its IPC protocol. If
> > > > > > we can dell drivers to use a head instead, we can abstract that SoC specific
> > > > > > complexity. I believe each allocated addressed has to be mapped to a zone, and
> > > > > > that can only be done in the secure application. I can imagine similar needs
> > > > > > when the protection is done using some sort of a VM / hypervisor.
> > > > > > 
> > > > > > Nicolas
> > > > > >     
> > > > > 
> > > > > The idea in this design is to abstract the heap management from the
> > > > > Panthor kernel driver (which consumes a DMA buffer from it).
> > > > > 
> > > > > In a system, an integrator would have implemented a secure heap driver,
> > > > > and could be based on TEE or a carved-out memory with restricted access,
> > > > > or else. This heap driver would be responsible of implementing the
> > > > > logic to: allocate, free, refcount, etc.
> > > > > 
> > > > > The heap would be retrieved by the Panthor kernel driver in order to
> > > > > allocate protected memory to load the FW and allow the GPU to enter/exit
> > > > > protected mode. This memory would not belong to a user space process.
> > > > > The driver allocates it at the time of loading the FW and initialization
> > > > > of the GPU HW. This is a device globally owned protected memory.    
> > > > 
> > > > The thing is, it's really not clear why you absolutely need to have the
> > > > Panthor driver involved there. It won't be transparent to userspace,
> > > > since you'd need an extra flag at allocation time, and the buffers
> > > > behave differently. If userspace has to be aware of it, what's the
> > > > advantage to your approach compared to just exposing a heap for those
> > > > secure buffers, and letting userspace allocate its buffers from there?    
> > > 
> > > Unless I'm mistaken, the Panthor driver loads its own firmware. Since loading
> > > the firmware requires placing the data in a protected memory region, and that
> > > this aspect has no exposure to userspace, how can Panthor not be implicated ?  
> > 
> > Right, the very reason we need protected memory early is because some
> > FW sections need to be allocated from the protected pool, otherwise the
> > TEE will fault as soon at the FW enters the so-called 'protected mode'.  
> 
> How does that work if you don't have some way to allocate the protected
> memory? You can still submit jobs to the GPU, but you can't submit /
> execute "protected jobs"?

Exactly.

> 
> > Now, it's not impossible to work around this limitation. For instance,
> > we could load the FW without this protected section by default (what we
> > do right now), and then provide a DRM_PANTHOR_ENABLE_FW_PROT_MODE
> > ioctl that would take a GEM object imported from a dmabuf allocated
> > from the protected dma-heap by userspace. We can then reset the FW and
> > allow it to operate in protected mode after that point.  
> 
> Urgh, I'd rather avoid that dance if possible :)

Me too.

> 
> > This approach has two downsides though:
> > 
> > 1. We have no way of checking that the memory we're passed is actually
> > suitable for FW execution in a protected context. If we're passed
> > random memory, this will likely hang the platform as soon as we enter
> > protected mode.  
> 
> It's a current limitation of dma-buf in general, and you'd have the same
> issue right now if someone imports a buffer, or misconfigure the heap
> for a !protected heap.
> 
> I'd really like to have some way to store some metadata in dma_buf, if
> only to tell that the buffer is protected.

The dma_buf has a pointer to its ops, so it should be relatively easy
to add an is_dma_buf_coming_from_this_heap() helper. Of course this
implies linking the consumer driver to the heap it's supposed to take
protected buffers from, which is basically the thing being discussed
here :-).

> 
> I suspect you'd also need that if you do things like do protected video
> playback through a codec, get a protected frame, and want to import that
> into the GPU. Depending on how you allocate it, either the codec or the
> GPU or both will want to make sure it's protected.

If it's all allocated from a central "protected" heap (even if that
goes through the driver calling the dma_heap_alloc_buffer()), it
shouldn't be an issue.

> 
> > 2. If the driver already boot the FW and exposed a DRI node, we might
> > have GPU workloads running, and doing a FW reset might incur a slight
> > delay in GPU jobs execution.
> > 
> > I think #1 is a more general issue that applies to suspend buffers
> > allocated for GPU contexts too. If we expose ioctls where we take
> > protected memory buffers that can possibly lead to crashes if they are
> > not real protected memory regions, and we have no way to ensure the
> > memory is protected, we probably want to restrict these ioctls/modes to
> > some high-privilege CAP_SYS_.
> > 
> > For #2, that's probably something we can live with, since it's a
> > one-shot thing. If it becomes an issue, we can even make sure we enable
> > the FW protected-mode before the GPU starts being used for real.
> > 
> > This being said, I think the problem applies outside Panthor, and it
> > might be that the video codec can't reset the FW/HW block to switch to
> > protected mode as easily as Panthor.
> >
> > Note that there's also downsides to the reserved-memory node approach,
> > where some bootloader stage would ask the secure FW to reserve a
> > portion of mem and pass this through the DT. This sort of things tend to
> > be an integration mess, where you need all the pieces of the stack (TEE,
> > u-boot, MTK dma-heap driver, gbm, ...) to be at a certain version to
> > work properly. If we go the ioctl() way, we restrict the scope to the
> > TEE, gbm/mesa and the protected-dma-heap driver, which is still a lot,
> > but we've ripped the bootloader out of the equation at least.  
> 
> Yeah. I also think there's two discussions in parallel here:
> 
>  1) Being able to allocate protected buffers from the driver
>  2) Exposing an interface to allocate those to userspace
> 
> I'm not really convinced we need 2, but 1 is obviously needed from what
> you're saying.

I suspect we need #2 for GBM, still. But that's what dma-heaps are for,
so I don't think that's a problem.