On Thu, May 24, 2018 at 8:27 AM, Christian König <christian.koenig@xxxxxxx> wrote: > Am 24.05.2018 um 02:31 schrieb Qiang Yu: >> >> On Wed, May 23, 2018 at 11:44 PM, Daniel Vetter <daniel@xxxxxxxx> wrote: >>> >>> On Wed, May 23, 2018 at 3:52 PM, Qiang Yu <yuq825@xxxxxxxxx> wrote: >>>> >>>> On Wed, May 23, 2018 at 5:29 PM, Christian König >>>> <ckoenig.leichtzumerken@xxxxxxxxx> wrote: >>>>> >>>>> Am 18.05.2018 um 11:27 schrieb Qiang Yu: >>>>>> >>>>>> Kernel DRM driver for ARM Mali 400/450 GPUs. >>>>>> >>>>>> This implementation mainly take amdgpu DRM driver as reference. >>>>>> >>>>>> - Mali 4xx GPUs have two kinds of processors GP and PP. GP is for >>>>>> OpenGL vertex shader processing and PP is for fragment shader >>>>>> processing. Each processor has its own MMU so prcessors work in >>>>>> virtual address space. >>>>>> - There's only one GP but multiple PP (max 4 for mali 400 and 8 >>>>>> for mali 450) in the same mali 4xx GPU. All PPs are grouped >>>>>> togather to handle a single fragment shader task divided by >>>>>> FB output tiled pixels. Mali 400 user space driver is >>>>>> responsible for assign target tiled pixels to each PP, but mali >>>>>> 450 has a HW module called DLBU to dynamically balance each >>>>>> PP's load. >>>>>> - User space driver allocate buffer object and map into GPU >>>>>> virtual address space, upload command stream and draw data with >>>>>> CPU mmap of the buffer object, then submit task to GP/PP with >>>>>> a register frame indicating where is the command stream and misc >>>>>> settings. >>>>>> - There's no command stream validation/relocation due to each user >>>>>> process has its own GPU virtual address space. GP/PP's MMU switch >>>>>> virtual address space before running two tasks from different >>>>>> user process. Error or evil user space code just get MMU fault >>>>>> or GP/PP error IRQ, then the HW/SW will be recovered. >>>>>> - Use TTM as MM. TTM_PL_TT type memory is used as the content of >>>>>> lima buffer object which is allocated from TTM page pool. all >>>>>> lima buffer object gets pinned with TTM_PL_FLAG_NO_EVICT when >>>>>> allocation, so there's no buffer eviction and swap for now. We >>>>>> need reverse engineering to see if and how GP/PP support MMU >>>>>> fault recovery (continue execution). Otherwise we have to >>>>>> pin/unpin each envolved buffer when task creation/deletion. >>>>> >>>>> >>>>> Well pinning all memory is usually a no-go for upstreaming. But since >>>>> you >>>>> are already using the drm_sched for GPU task scheduling why are you >>>>> actually >>>>> needing this? >>>>> >>>>> The scheduler should take care of signaling all fences when the >>>>> hardware is >>>>> done with it's magic and that is enough for TTM to note that a buffer >>>>> object >>>>> is movable again (e.g. unpin them). >>>> >>>> Please correct me if I'm wrong. >>>> >>>> One way to implement eviction/swap is like this: >>>> call validation on each buffers involved in a task, but this won't >>>> prevent it from >>>> eviction/swap when executing, so a GPU MMU fault may happen and in the >>>> handler we need to recover the buffer evicted/swapped. >>>> >>>> Another way is pin/unpin buffers evolved when task create/free. >>>> >>>> First way is better when memory load is low and second way is better >>>> when >>>> memory load is high. First way also need less memory. >>>> >>>> So I'd prefer first way but due to the GPU MMU fault >>>> HW op need reverse engineering, I have to pin all buffers now. After >>>> the HW op is clear, I can choose one way to implement. >>> >>> All the drivers using ttm have something that looks like vram, or a >>> requirement to move buffers around. Afaiui that includes virtio drm >>> driver. >> >> Does virtio drm driver need to move buffers around? amdgpu also >> has no vram when APU. Afaiui APUs have a range of stolen memory which looks and acts and is managed like discrete vram. Including moving buffers around. >>> From your description you don't have such a requirement, and >>> then doing what etnaviv has done would be a lot simpler. Everything >>> that's not related to buffer movement handling is also available >>> outside of ttm already. >> >> Yeah, I could do like etnaviv, but it's not simpler than using ttm >> directly especially want some optimization (like ttm page pool, >> ttm_eu_reserve_buffers, ttm_bo_mmap). If I have/want to implement >> them, why not just use TTM directly with all those helper functions. > > > Well TTM has some design flaws (e.g. heavily layered design etc...), but it > also offers some rather nice functionality. Yeah, but I still think that for non-discrete drivers just moving a bunch of more of the neat ttm functionality into helpers where everyone can use them (instead of the binary ttm y/n decision) would be much better. E.g. the allocator pool definitely sounds like something gem helpers should be able to do, same for reserving a pile of buffers or default mmap implementations. A lot of that also exists already, thanks to lots of efforts from Noralf Tronnes and others. I think ideally the long-term goal would be to modularize ttm concepts as much as possible, so that drivers can flexibly pick&choose the bits they need. We're slowly getting there (but definitely not yet there if you need to manage discrete vram I think). -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html