Rob Herring <robh@xxxxxxxxxx> writes: > On Thu, Feb 7, 2019 at 9:51 AM Daniel Vetter <daniel@xxxxxxxx> wrote: >> >> On Thu, Feb 07, 2019 at 11:21:52PM +0800, Qiang Yu wrote: >> > On Thu, Feb 7, 2019 at 5:09 PM Daniel Vetter <daniel@xxxxxxxx> wrote: >> > > >> > > On Wed, Feb 06, 2019 at 09:14:55PM +0800, Qiang Yu wrote: >> > > > Kernel DRM driver for ARM Mali 400/450 GPUs. >> > > > >> > > > Since last RFC, all feedback has been addressed. Most Mali DTS >> > > > changes are already upstreamed by SoC maintainers. The kernel >> > > > driver and user-kernel interface are quite stable for several >> > > > months, so I think it's ready to be upstreamed. >> > > > >> > > > This implementation mainly take amdgpu DRM driver as reference. >> > > > >> > > > - Mali 4xx GPUs have two kinds of processors GP and PP. GP is for >> > > > OpenGL vertex shader processing and PP is for fragment shader >> > > > processing. Each processor has its own MMU so prcessors work in >> > > > virtual address space. >> > > > - There's only one GP but multiple PP (max 4 for mali 400 and 8 >> > > > for mali 450) in the same mali 4xx GPU. All PPs are grouped >> > > > togather to handle a single fragment shader task divided by >> > > > FB output tiled pixels. Mali 400 user space driver is >> > > > responsible for assign target tiled pixels to each PP, but mali >> > > > 450 has a HW module called DLBU to dynamically balance each >> > > > PP's load. >> > > > - User space driver allocate buffer object and map into GPU >> > > > virtual address space, upload command stream and draw data with >> > > > CPU mmap of the buffer object, then submit task to GP/PP with >> > > > a register frame indicating where is the command stream and misc >> > > > settings. >> > > > - There's no command stream validation/relocation due to each user >> > > > process has its own GPU virtual address space. GP/PP's MMU switch >> > > > virtual address space before running two tasks from different >> > > > user process. Error or evil user space code just get MMU fault >> > > > or GP/PP error IRQ, then the HW/SW will be recovered. >> > > > - Use TTM as MM. TTM_PL_TT type memory is used as the content of >> > > > lima buffer object which is allocated from TTM page pool. all >> > > > lima buffer object gets pinned with TTM_PL_FLAG_NO_EVICT when >> > > > allocation, so there's no buffer eviction and swap for now. >> > > >> > > All other render gpu drivers that have unified memory (aka is on the SoC) >> > > use GEM directly, with some of the helpers we have. So msm, etnaviv, vc4 >> > > (and i915 is kinda the same too really). TTM makes sense if you have some >> > > discrete memory to manage, but imo not in any other place really. >> > > >> > > What's the design choice behind this? >> > To be honest, it's just because TTM offers more helpers. I did implement >> > a GEM way with cma alloc at the beginning. But when implement paged mem, >> > I found TTM has mem pool alloc, sync and mmap related helpers which covers >> > much of my existing code. It's totally possible with GEM, but not as easy as >> > TTM to me. And virtio-gpu seems an example to use TTM without discrete >> > mem. Shouldn't TTM a super set of both unified mem and discrete mem? >> >> virtio does have fake vram and migration afaiui. And sure, you can use TTM >> without the vram migration, it's just that most of the complexity of TTM >> is due to buffer placement and migration and all that stuff. If you never >> need to move buffers, then you don't need that ever. >> >> Wrt lack of helpers, what exactly are you looking for? A big part of these >> for TTM is that TTM is a bid a midlayer, so reinvents a bunch of things >> provided by e.g. dma-api. It's cleaner to use the dma-api directly. Basing >> the lima kernel driver on vc4, freedreno or etnaviv (last one is probably >> closest, since it doesn't have a display block either) would be better I >> think. > > FWIW, I'm working on the panfrost driver and am using the shmem > helpers from Noralf. It's the early stages though. I started a patch > for etnaviv to use it too, but found I need to rework it to sub-class > the shmem GEM object. Did you just convert the shmem helpers over to doing alloc_coherent? If so, I'd be interested in picking them up for v3d, and that might help get another patch out of your stack. I'm particularly interested in the shmem helpers because I should start doing dynamic binding in and out of the GPU's page table, to avoid pinning so much memory all the time.
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel