On Tue, 11 Feb 2025 14:46:56 +0100 Maxime Ripard <mripard@xxxxxxxxxx> wrote: > Hi Boris, > > On Fri, Feb 07, 2025 at 04:02:53PM +0100, Boris Brezillon wrote: > > Sorry for joining the party late, a couple of comments to back Akash > > and Nicolas' concerns. > > > > On Wed, 05 Feb 2025 13:14:14 -0500 > > Nicolas Dufresne <nicolas@xxxxxxxxxxxx> wrote: > > > > > Le mercredi 05 février 2025 à 15:52 +0100, Maxime Ripard a écrit : > > > > On Mon, Feb 03, 2025 at 04:43:23PM +0000, Florent Tomasin wrote: > > > > > Hi Maxime, Nicolas > > > > > > > > > > On 30/01/2025 17:47, Nicolas Dufresne wrote: > > > > > > Le jeudi 30 janvier 2025 à 17:38 +0100, Maxime Ripard a écrit : > > > > > > > Hi Nicolas, > > > > > > > > > > > > > > On Thu, Jan 30, 2025 at 10:59:56AM -0500, Nicolas Dufresne wrote: > > > > > > > > Le jeudi 30 janvier 2025 à 14:46 +0100, Maxime Ripard a écrit : > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > I started to review it, but it's probably best to discuss it here. > > > > > > > > > > > > > > > > > > On Thu, Jan 30, 2025 at 01:08:56PM +0000, Florent Tomasin wrote: > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > This is a patch series covering the support for protected mode execution in > > > > > > > > > > Mali Panthor CSF kernel driver. > > > > > > > > > > > > > > > > > > > > The Mali CSF GPUs come with the support for protected mode execution at the > > > > > > > > > > HW level. This feature requires two main changes in the kernel driver: > > > > > > > > > > > > > > > > > > > > 1) Configure the GPU with a protected buffer. The system must provide a DMA > > > > > > > > > > heap from which the driver can allocate a protected buffer. > > > > > > > > > > It can be a carved-out memory or dynamically allocated protected memory region. > > > > > > > > > > Some system includes a trusted FW which is in charge of the protected memory. > > > > > > > > > > Since this problem is integration specific, the Mali Panthor CSF kernel > > > > > > > > > > driver must import the protected memory from a device specific exporter. > > > > > > > > > > > > > > > > > > Why do you need a heap for it in the first place? My understanding of > > > > > > > > > your series is that you have a carved out memory region somewhere, and > > > > > > > > > you want to allocate from that carved out memory region your buffers. > > > > > > > > > > > > > > > > > > How is that any different from using a reserved-memory region, adding > > > > > > > > > the reserved-memory property to the GPU device and doing all your > > > > > > > > > allocation through the usual dma_alloc_* API? > > > > > > > > > > > > > > > > How do you then multiplex this region so it can be shared between > > > > > > > > GPU/Camera/Display/Codec drivers and also userspace ? > > > > > > > > > > > > > > You could point all the devices to the same reserved memory region, and > > > > > > > they would all allocate from there, including for their userspace-facing > > > > > > > allocations. > > > > > > > > > > > > I get that using memory region is somewhat more of an HW description, and > > > > > > aligned with what a DT is supposed to describe. One of the challenge is that > > > > > > Mediatek heap proposal endup calling into their TEE, meaning knowing the region > > > > > > is not that useful. You actually need the TEE APP guid and its IPC protocol. If > > > > > > we can dell drivers to use a head instead, we can abstract that SoC specific > > > > > > complexity. I believe each allocated addressed has to be mapped to a zone, and > > > > > > that can only be done in the secure application. I can imagine similar needs > > > > > > when the protection is done using some sort of a VM / hypervisor. > > > > > > > > > > > > Nicolas > > > > > > > > > > > > > > > > The idea in this design is to abstract the heap management from the > > > > > Panthor kernel driver (which consumes a DMA buffer from it). > > > > > > > > > > In a system, an integrator would have implemented a secure heap driver, > > > > > and could be based on TEE or a carved-out memory with restricted access, > > > > > or else. This heap driver would be responsible of implementing the > > > > > logic to: allocate, free, refcount, etc. > > > > > > > > > > The heap would be retrieved by the Panthor kernel driver in order to > > > > > allocate protected memory to load the FW and allow the GPU to enter/exit > > > > > protected mode. This memory would not belong to a user space process. > > > > > The driver allocates it at the time of loading the FW and initialization > > > > > of the GPU HW. This is a device globally owned protected memory. > > > > > > > > The thing is, it's really not clear why you absolutely need to have the > > > > Panthor driver involved there. It won't be transparent to userspace, > > > > since you'd need an extra flag at allocation time, and the buffers > > > > behave differently. If userspace has to be aware of it, what's the > > > > advantage to your approach compared to just exposing a heap for those > > > > secure buffers, and letting userspace allocate its buffers from there? > > > > > > Unless I'm mistaken, the Panthor driver loads its own firmware. Since loading > > > the firmware requires placing the data in a protected memory region, and that > > > this aspect has no exposure to userspace, how can Panthor not be implicated ? > > > > Right, the very reason we need protected memory early is because some > > FW sections need to be allocated from the protected pool, otherwise the > > TEE will fault as soon at the FW enters the so-called 'protected mode'. > > How does that work if you don't have some way to allocate the protected > memory? You can still submit jobs to the GPU, but you can't submit / > execute "protected jobs"? Exactly. > > > Now, it's not impossible to work around this limitation. For instance, > > we could load the FW without this protected section by default (what we > > do right now), and then provide a DRM_PANTHOR_ENABLE_FW_PROT_MODE > > ioctl that would take a GEM object imported from a dmabuf allocated > > from the protected dma-heap by userspace. We can then reset the FW and > > allow it to operate in protected mode after that point. > > Urgh, I'd rather avoid that dance if possible :) Me too. > > > This approach has two downsides though: > > > > 1. We have no way of checking that the memory we're passed is actually > > suitable for FW execution in a protected context. If we're passed > > random memory, this will likely hang the platform as soon as we enter > > protected mode. > > It's a current limitation of dma-buf in general, and you'd have the same > issue right now if someone imports a buffer, or misconfigure the heap > for a !protected heap. > > I'd really like to have some way to store some metadata in dma_buf, if > only to tell that the buffer is protected. The dma_buf has a pointer to its ops, so it should be relatively easy to add an is_dma_buf_coming_from_this_heap() helper. Of course this implies linking the consumer driver to the heap it's supposed to take protected buffers from, which is basically the thing being discussed here :-). > > I suspect you'd also need that if you do things like do protected video > playback through a codec, get a protected frame, and want to import that > into the GPU. Depending on how you allocate it, either the codec or the > GPU or both will want to make sure it's protected. If it's all allocated from a central "protected" heap (even if that goes through the driver calling the dma_heap_alloc_buffer()), it shouldn't be an issue. > > > 2. If the driver already boot the FW and exposed a DRI node, we might > > have GPU workloads running, and doing a FW reset might incur a slight > > delay in GPU jobs execution. > > > > I think #1 is a more general issue that applies to suspend buffers > > allocated for GPU contexts too. If we expose ioctls where we take > > protected memory buffers that can possibly lead to crashes if they are > > not real protected memory regions, and we have no way to ensure the > > memory is protected, we probably want to restrict these ioctls/modes to > > some high-privilege CAP_SYS_. > > > > For #2, that's probably something we can live with, since it's a > > one-shot thing. If it becomes an issue, we can even make sure we enable > > the FW protected-mode before the GPU starts being used for real. > > > > This being said, I think the problem applies outside Panthor, and it > > might be that the video codec can't reset the FW/HW block to switch to > > protected mode as easily as Panthor. > > > > Note that there's also downsides to the reserved-memory node approach, > > where some bootloader stage would ask the secure FW to reserve a > > portion of mem and pass this through the DT. This sort of things tend to > > be an integration mess, where you need all the pieces of the stack (TEE, > > u-boot, MTK dma-heap driver, gbm, ...) to be at a certain version to > > work properly. If we go the ioctl() way, we restrict the scope to the > > TEE, gbm/mesa and the protected-dma-heap driver, which is still a lot, > > but we've ripped the bootloader out of the equation at least. > > Yeah. I also think there's two discussions in parallel here: > > 1) Being able to allocate protected buffers from the driver > 2) Exposing an interface to allocate those to userspace > > I'm not really convinced we need 2, but 1 is obviously needed from what > you're saying. I suspect we need #2 for GBM, still. But that's what dma-heaps are for, so I don't think that's a problem.