On Wed, May 23, 2018 at 10:19 PM, Christian König <christian.koenig@xxxxxxx> wrote: > Am 23.05.2018 um 16:13 schrieb Qiang Yu: >> >> On Wed, May 23, 2018 at 9:59 PM, Christian König >> <christian.koenig@xxxxxxx> wrote: >>> >>> Am 23.05.2018 um 15:52 schrieb Qiang Yu: >>>> >>>> On Wed, May 23, 2018 at 5:29 PM, Christian König >>>> <ckoenig.leichtzumerken@xxxxxxxxx> wrote: >>>>> >>>>> Am 18.05.2018 um 11:27 schrieb Qiang Yu: >>>>>> >>>>>> Kernel DRM driver for ARM Mali 400/450 GPUs. >>>>>> >>>>>> This implementation mainly take amdgpu DRM driver as reference. >>>>>> >>>>>> - Mali 4xx GPUs have two kinds of processors GP and PP. GP is for >>>>>> OpenGL vertex shader processing and PP is for fragment shader >>>>>> processing. Each processor has its own MMU so prcessors work in >>>>>> virtual address space. >>>>>> - There's only one GP but multiple PP (max 4 for mali 400 and 8 >>>>>> for mali 450) in the same mali 4xx GPU. All PPs are grouped >>>>>> togather to handle a single fragment shader task divided by >>>>>> FB output tiled pixels. Mali 400 user space driver is >>>>>> responsible for assign target tiled pixels to each PP, but mali >>>>>> 450 has a HW module called DLBU to dynamically balance each >>>>>> PP's load. >>>>>> - User space driver allocate buffer object and map into GPU >>>>>> virtual address space, upload command stream and draw data with >>>>>> CPU mmap of the buffer object, then submit task to GP/PP with >>>>>> a register frame indicating where is the command stream and misc >>>>>> settings. >>>>>> - There's no command stream validation/relocation due to each user >>>>>> process has its own GPU virtual address space. GP/PP's MMU switch >>>>>> virtual address space before running two tasks from different >>>>>> user process. Error or evil user space code just get MMU fault >>>>>> or GP/PP error IRQ, then the HW/SW will be recovered. >>>>>> - Use TTM as MM. TTM_PL_TT type memory is used as the content of >>>>>> lima buffer object which is allocated from TTM page pool. all >>>>>> lima buffer object gets pinned with TTM_PL_FLAG_NO_EVICT when >>>>>> allocation, so there's no buffer eviction and swap for now. We >>>>>> need reverse engineering to see if and how GP/PP support MMU >>>>>> fault recovery (continue execution). Otherwise we have to >>>>>> pin/unpin each envolved buffer when task creation/deletion. >>>>> >>>>> >>>>> Well pinning all memory is usually a no-go for upstreaming. But since >>>>> you >>>>> are already using the drm_sched for GPU task scheduling why are you >>>>> actually >>>>> needing this? >>>>> >>>>> The scheduler should take care of signaling all fences when the >>>>> hardware >>>>> is >>>>> done with it's magic and that is enough for TTM to note that a buffer >>>>> object >>>>> is movable again (e.g. unpin them). >>>> >>>> Please correct me if I'm wrong. >>> >>> >>> Well, you are wrong :) >>> >>>> One way to implement eviction/swap is like this: >>>> call validation on each buffers involved in a task, but this won't >>>> prevent it from >>>> eviction/swap when executing, so a GPU MMU fault may happen and in the >>>> handler we need to recover the buffer evicted/swapped. >>>> >>>> Another way is pin/unpin buffers evolved when task create/free. >>>> >>>> First way is better when memory load is low and second way is better >>>> when >>>> memory load is high. First way also need less memory. >>>> >>>> So I'd prefer first way but due to the GPU MMU fault >>>> HW op need reverse engineering, I have to pin all buffers now. After >>>> the HW op is clear, I can choose one way to implement. >>> >>> >>> The general approach is: >>> 1.) Lock all BOs >>> 2.) Validate all BOs >>> 3.) Add the fence >>> 4.) Unlock the BOs >> >> This is the task prepare process, right? > > > Yes. > >>> BOs can't be evicted while they are locked >> >> During the task prepare stage, they're locked, but after task queued, they >> get unlocked and be evictable? > > > Yes, the fence you added to the BO prevents TTM from evicting the BO until > the fence signaled. > >> >>> and since you already add the >>> fence that should be perfectly sufficient to prevent it from being >>> evicted >>> until your operation is completed. >> >> You mean I have to explicitly pin it with TTM_PL_FLAG_NO_EVICT >> when task creation or TTM will check buffer's reservation object and >> won't evict it if see a fence? > > > The second. You *don't* have to explicitly pin it with TTM_PL_FLAG_NO_EVICT > as long as you always add the correct fence with your command submissions. > > When evicting something TTM will take a look at the fences assigned to the > BO and either don't evict it at all or wait for all fences to be completed > before doing so. > > When you need to update some internal state or flush caches or stuff like > that when a BO is evicted TTM also has callbacks for this. Oh, thanks for clearing this for me, it makes my life easier. Regards, Qiang > > Regards, > Christian. > > >> >> Regards, >> Qiang >> >>> Using the MMU is certainly be better in general, but usually only >>> optional >>> and a pain in the ass to get working. We have that in amdgpu for quite a >>> while as well now and still don't use it because of that. >>> >>> Regards, >>> Christian. >>> >>> >>>> Regards, >>>> Qiang >>>> >>>>> Christian. >>>>> >>>>> >>>>>> - Use drm_sched for GPU task schedule. Each OpenGL context should >>>>>> have a lima context object in the kernel to distinguish tasks >>>>>> from different user. drm_sched gets task from each lima context >>>>>> in a fair way. >>>>>> >>>>>> Not implemented: >>>>>> - Dump buffer support >>>>>> - Power management >>>>>> - Performance counter >>>>>> >>>>>> This patch serial just pack a pair of .c/.h files in each patch. >>>>>> For whole history of this driver's development, see: >>>>>> https://github.com/yuq/linux-lima/commits/lima-4.17-rc4 >>>>>> >>>>>> Mesa driver is still in development and not ready for daily usage, >>>>>> but can run some simple tests like kmscube and glamrk2, see: >>>>>> https://github.com/yuq/mesa-lima >>>>>> >>>>>> Andrei Paulau (1): >>>>>> arm64/dts: add switch-delay for meson mali >>>>>> >>>>>> Lima Project Developers (10): >>>>>> drm/lima: add mali 4xx GPU hardware regs >>>>>> drm/lima: add lima core driver >>>>>> drm/lima: add GPU device functions >>>>>> drm/lima: add PMU related functions >>>>>> drm/lima: add PP related functions >>>>>> drm/lima: add MMU related functions >>>>>> drm/lima: add GPU virtual memory space handing >>>>>> drm/lima: add GEM related functions >>>>>> drm/lima: add GEM Prime related functions >>>>>> drm/lima: add makefile and kconfig >>>>>> >>>>>> Qiang Yu (12): >>>>>> dt-bindings: add switch-delay property for mali-utgard >>>>>> arm64/dts: add switch-delay for meson mali >>>>>> Revert "drm: Nerf the preclose callback for modern drivers" >>>>>> drm/lima: add lima uapi header >>>>>> drm/lima: add L2 cache functions >>>>>> drm/lima: add GP related functions >>>>>> drm/lima: add BCAST related function >>>>>> drm/lima: add DLBU related functions >>>>>> drm/lima: add TTM subsystem functions >>>>>> drm/lima: add buffer object functions >>>>>> drm/lima: add GPU schedule using DRM_SCHED >>>>>> drm/lima: add context related functions >>>>>> >>>>>> Simon Shields (1): >>>>>> ARM: dts: add gpu node to exynos4 >>>>>> >>>>>> .../bindings/gpu/arm,mali-utgard.txt | 4 + >>>>>> arch/arm/boot/dts/exynos4.dtsi | 33 ++ >>>>>> arch/arm64/boot/dts/amlogic/meson-gxbb.dtsi | 1 + >>>>>> .../boot/dts/amlogic/meson-gxl-mali.dtsi | 1 + >>>>>> drivers/gpu/drm/Kconfig | 2 + >>>>>> drivers/gpu/drm/Makefile | 1 + >>>>>> drivers/gpu/drm/drm_file.c | 8 +- >>>>>> drivers/gpu/drm/lima/Kconfig | 9 + >>>>>> drivers/gpu/drm/lima/Makefile | 19 + >>>>>> drivers/gpu/drm/lima/lima_bcast.c | 65 +++ >>>>>> drivers/gpu/drm/lima/lima_bcast.h | 34 ++ >>>>>> drivers/gpu/drm/lima/lima_ctx.c | 143 +++++ >>>>>> drivers/gpu/drm/lima/lima_ctx.h | 51 ++ >>>>>> drivers/gpu/drm/lima/lima_device.c | 407 ++++++++++++++ >>>>>> drivers/gpu/drm/lima/lima_device.h | 136 +++++ >>>>>> drivers/gpu/drm/lima/lima_dlbu.c | 75 +++ >>>>>> drivers/gpu/drm/lima/lima_dlbu.h | 37 ++ >>>>>> drivers/gpu/drm/lima/lima_drv.c | 466 >>>>>> ++++++++++++++++ >>>>>> drivers/gpu/drm/lima/lima_drv.h | 77 +++ >>>>>> drivers/gpu/drm/lima/lima_gem.c | 459 >>>>>> ++++++++++++++++ >>>>>> drivers/gpu/drm/lima/lima_gem.h | 41 ++ >>>>>> drivers/gpu/drm/lima/lima_gem_prime.c | 66 +++ >>>>>> drivers/gpu/drm/lima/lima_gem_prime.h | 31 ++ >>>>>> drivers/gpu/drm/lima/lima_gp.c | 293 +++++++++++ >>>>>> drivers/gpu/drm/lima/lima_gp.h | 34 ++ >>>>>> drivers/gpu/drm/lima/lima_l2_cache.c | 98 ++++ >>>>>> drivers/gpu/drm/lima/lima_l2_cache.h | 32 ++ >>>>>> drivers/gpu/drm/lima/lima_mmu.c | 154 ++++++ >>>>>> drivers/gpu/drm/lima/lima_mmu.h | 34 ++ >>>>>> drivers/gpu/drm/lima/lima_object.c | 120 +++++ >>>>>> drivers/gpu/drm/lima/lima_object.h | 87 +++ >>>>>> drivers/gpu/drm/lima/lima_pmu.c | 85 +++ >>>>>> drivers/gpu/drm/lima/lima_pmu.h | 30 ++ >>>>>> drivers/gpu/drm/lima/lima_pp.c | 418 >>>>>> +++++++++++++++ >>>>>> drivers/gpu/drm/lima/lima_pp.h | 37 ++ >>>>>> drivers/gpu/drm/lima/lima_regs.h | 304 +++++++++++ >>>>>> drivers/gpu/drm/lima/lima_sched.c | 497 >>>>>> ++++++++++++++++++ >>>>>> drivers/gpu/drm/lima/lima_sched.h | 126 +++++ >>>>>> drivers/gpu/drm/lima/lima_ttm.c | 409 ++++++++++++++ >>>>>> drivers/gpu/drm/lima/lima_ttm.h | 44 ++ >>>>>> drivers/gpu/drm/lima/lima_vm.c | 312 +++++++++++ >>>>>> drivers/gpu/drm/lima/lima_vm.h | 73 +++ >>>>>> include/drm/drm_drv.h | 23 +- >>>>>> include/uapi/drm/lima_drm.h | 195 +++++++ >>>>>> 44 files changed, 5565 insertions(+), 6 deletions(-) >>>>>> create mode 100644 drivers/gpu/drm/lima/Kconfig >>>>>> create mode 100644 drivers/gpu/drm/lima/Makefile >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_bcast.c >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_bcast.h >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_ctx.c >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_ctx.h >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_device.c >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_device.h >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_dlbu.c >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_dlbu.h >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_drv.c >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_drv.h >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_gem.c >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_gem.h >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_gem_prime.c >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_gem_prime.h >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_gp.c >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_gp.h >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_l2_cache.c >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_l2_cache.h >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_mmu.c >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_mmu.h >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_object.c >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_object.h >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_pmu.c >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_pmu.h >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_pp.c >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_pp.h >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_regs.h >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_sched.c >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_sched.h >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_ttm.c >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_ttm.h >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_vm.c >>>>>> create mode 100644 drivers/gpu/drm/lima/lima_vm.h >>>>>> create mode 100644 include/uapi/drm/lima_drm.h >>>>>> > _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel