First, I don't think this whole series is ready for merge yet. It is however ready for a review, and I think a lot of the prep patches in the series could be merged to make my rebasing life a bit easier. I cannot continue ignoring pretty much all emails/bugs as I have for the last month to wrap up this series. The current state is on my IVB, things are pretty stable. I've seen one unexplained hang, but I'm hopeful review might help me uncover/explain. This patch series introduces the next step in enabling full PPGTT, which is per fd address space/context, and it also contains the previously unmerged patches (some of which have been reworked, modified, or rebased). In regards to the continued VMA changes, I think in these, the delta with regard to the last posting is the bound list was per VM. It is now global. I've also moved the active list to per VM. Brand new in the series we take the previous series' context per fd (with address space) one step further and actually switch the address spaces when we do context switches. In order to make this happen, the series continues to chip away at removing the notion of an object only ever being bound into one address space via the struct i915_address_space and struct i915_vma data structures which are really abstractions for a page directory, and current mapped ptes respectively. The error state is improved since the last series (though still some work there is probably needed). It also serves to remove the notion of the aliasing PPGTT since in theory everything bound into the GGTT shouldn't benefit from an aliasing PPGTT (fact check). With every context having it's own address space, and every open DRM fd having it's own context, it's trivial on execbuf to lookup a context and do the pinning in the proper address space. More importantly, it's implicit that a context exists, which made this impossible to do earlier. *A note on patch ordering:* In order to work this series incrementally, the final patch ordering is admittedly a little bit strange. I'm more than willing to rework these as requested, but I'd really prefer not to do really heavy reordering unless there is a major benefit, or of course to fix bugs. # What is not in this patch series in the order I think we should handle it (and I acknowledge some of this stuff is non-trivial): ## Review + QA coverage ## Porting to HSW It shouldn't be too much extra work, if any, to add support. I haven't looked into it yet. ## Better vm/ppgtt info in error state collection In particular, I want to dump all the PTEs at hang, and at the very least the guilt PTEs. This isn't difficult, and can be done with copypasta from the existing dumper I have. ## User space and the implications Now that contexts are valid on all rings, userspace should begin emitting the context for all rings if it expects both rings to be able to access both objects in the same offset. The mesa change looks to be just one line which simplies emits the context for batch->is_blit, I doubt libva is using contexts, and SNA seems not to. The plan to support mesa will be to do the detection in libdrm, and go ahead with the simple mesa one liner. I've been using the oneliner if mesa for a while now, but we will need to support old user space in the kernel. So there might be a bit of work even on the kernel side here. We also need some IGT tools test updates. I have messy versions of these locally already. ## Performance data I think it doesn't preclude preliminary review of the patches since the main goal of PPGTT is really abourt security, correctness, and enabling other things. I will update with some numbers after I work on it a bit more. ## Testing on SNB If our current code is correct, then I think these patches might work on SNB as is, but it's untested. There is currently no way to disconnect contexts + PPGTT from the whole thing; so if this doesn't work - we'll need rework some of the code. I think it should just entail bringing back aliasing ppgtt, and not doing the address space switch when switching contexts (aliasing ppgtt will have a null switch_mm()). ## Soft pin interface I'd like to defer the discussion until these patches are merged. ## On demand page table allocation This is a potentially very useful optimization for at least the following reasons: * any app using contexts will have an extra set of page tables it isn't using; wasted memory * Reduce DCLV to reduce pd fetch latency * Allows Better use of a switch to default context for low memory situations (evicting unused page tables for example) ## Per VMA cache levels/control There are situations in the code where we have to flush the GPU pipeline in order to change cache levels. This should no longer be the case for unaffected VMs (I think). The same may be true with domain tracking. ## dmabuf/prime integration I haven't looked into what's missing to support it. If I'm lucky, it just works. With that, if you haven't already moved on, chanting tl;dr - all comments welcome. --- Ben Widawsky (65): drm/i915: Remove extra error state NULL drm/i915: Extract error buffer capture drm/i915: make PDE|PTE platform specific drm/i915: Don't clear gtt with 0 entries drm/i915: Conditionally use guard page based on PPGTT drm/i915: Use drm_mm for PPGTT PDEs drm/i915: cleanup context fini drm/i915: Do a fuller init after reset drm/i915: Split context enabling from init drm/i915: destroy i915_gem_init_global_gtt drm/i915: Embed PPGTT into the context drm/i915: Unify PPGTT codepaths on gen6+ drm/i915: Move ppgtt initialization down drm/i915: Tie context to PPGTT drm/i915: Really share scratch page drm/i915: Combine scratch members into a struct drm/i915: Drop dev from pte_encode drm/i915: Use gtt shortform where possible drm/i915: Move fbc members out of line drm/i915: Move gtt and ppgtt under address space umbrella drm/i915: Move gtt_mtrr to i915_gtt drm/i915: Move stolen stuff to i915_gtt drm/i915: Move aliasing_ppgtt drm/i915: Put the mm in the parent address space drm/i915: Move active/inactive lists to new mm drm/i915: Create a global list of vms drm/i915: Remove object's gtt_offset drm: pre allocate node for create_block drm/i915: Getter/setter for object attributes drm/i915: Create VMAs (part 1) drm/i915: Create VMAs (part 2) - kill gtt space drm/i915: Create VMAs (part 3) - plumbing drm/i915: Create VMAs (part 3.5) - map and fenceable tracking drm/i915: Create VMAs (part 4) - Error capture drm/i915: Create VMAs (part 5) - move mm_list drm/i915: Create VMAs (part 6) - finish error plumbing drm/i915: create an object_is_active() drm/i915: Move active to vma drm/i915: Track all VMAs per VM drm/i915: Defer request freeing drm/i915: Clean up VMAs before freeing drm/i915: Replace has_bsd/blt with a mask drm/i915: Catch missed context unref earlier drm/i915: Add a context open function drm/i915: Permit contexts on all rings drm/i915: Fix context fini refcounts drm/i915: Better reset handling for contexts drm/i915: Create a per file_priv default context drm/i915: Remove ring specificity from contexts drm/i915: Track which ring a context ran on drm/i915: dump error state based on capture drm/i915: PPGTT should take a ppgtt argument drm/i915: USE LRI for switching PP_DIR_BASE drm/i915: Extract mm switching to function drm/i915: Write PDEs at init instead of enable drm/i915: Disallow pin with full ppgtt drm/i915: Get context early in execbuf drm/i915: Pass ctx directly to switch/hangstat drm/i915: Actually add the new address spaces drm/i915: Use multiple VMs drm/i915: Kill now unused ppgtt_{un,}bind drm/i915: Add PPGTT dumper drm/i915: Dump all ppgtt drm/i915: Add debugfs for vma info per vm drm/i915: Getparam full ppgtt Chris Wilson (1): drm: Optionally create mm blocks from top-to-bottom drivers/gpu/drm/drm_mm.c | 134 +++--- drivers/gpu/drm/i915/i915_debugfs.c | 215 ++++++++-- drivers/gpu/drm/i915/i915_dma.c | 25 +- drivers/gpu/drm/i915/i915_drv.c | 57 ++- drivers/gpu/drm/i915/i915_drv.h | 353 ++++++++++------ drivers/gpu/drm/i915/i915_gem.c | 639 +++++++++++++++++++++-------- drivers/gpu/drm/i915/i915_gem_context.c | 279 +++++++++---- drivers/gpu/drm/i915/i915_gem_debug.c | 11 +- drivers/gpu/drm/i915/i915_gem_evict.c | 64 +-- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 138 ++++--- drivers/gpu/drm/i915/i915_gem_gtt.c | 541 ++++++++++++++---------- drivers/gpu/drm/i915/i915_gem_stolen.c | 87 ++-- drivers/gpu/drm/i915/i915_gem_tiling.c | 21 +- drivers/gpu/drm/i915/i915_irq.c | 197 ++++++--- drivers/gpu/drm/i915/i915_trace.h | 38 +- drivers/gpu/drm/i915/intel_display.c | 40 +- drivers/gpu/drm/i915/intel_drv.h | 7 - drivers/gpu/drm/i915/intel_fb.c | 8 +- drivers/gpu/drm/i915/intel_overlay.c | 26 +- drivers/gpu/drm/i915/intel_pm.c | 65 +-- drivers/gpu/drm/i915/intel_ringbuffer.c | 32 +- drivers/gpu/drm/i915/intel_sprite.c | 8 +- include/drm/drm_mm.h | 147 ++++--- include/uapi/drm/i915_drm.h | 1 + 24 files changed, 2044 insertions(+), 1089 deletions(-) -- 1.8.3.1