Just as before, these patches are living based off of my Broadwell branch, here: http://cgit.freedesktop.org/~bwidawsk/drm-intel/log/?h=gpu_mirror This is the follow-on patches for [1] This patch series brings 3 things: 1. Dynamic page table allocation for gen6-8 2. 64b (48b canonical) graphics virtual address space for Broadwell 3. An interface to specify a specific offset for a BO. It's taken way longer than I thought to get this work done, and given the current state of our driver, I fear I may not have time to see this through to the end before I am pulled onto other things. If people want to send me smallish bugfixes, I will gladly do my best to fix them quickly. If there are more substantial change requests wrt design or patch reorganization, I will not be able to accommodate. Someone else must take over this patch series at that point if they want these features. I do believe that everything up until the userptr patch is in decent shape though, so we'll see, I guess. (if you are qualified to take this over, and have interest, please let me know). The patch series is highly volatile and not manicured. I've run exactly 1 test on the GPU mirror (see below for what that means), though many more on the prior stuff. The series depends on full PPGTT, which is not yet enabled by default, and has a few outstanding issues. It also has been developed exclusively on pre-production hardware. I am only sending out now because I will be on vacation for the next 10 days, and I know there are people that can benefit from this code before I return. With that, I got the last parts of this working very recently, and they're very hackish. The reason for this lack of refinement is I expect the interfaces for letting userspace dictate things to change (more on this later), and the other part is I just ran out of time before my vacation. Throughout development, I've been hitting issues which I am not yet sure if they are bugs in my code, bugs in full PPGTT, bugs in userptr, or generally flakiness. There are a few patches in here which say TESTME reflecting upon this. Also, if you want to run this, I highly recommend turning off semaphores, and rc6. (To be honest, I've not tried it recently). You also need to turn on PPGTT since it is disabled by default. modprobe i915 enable_ppgtt=2 semaphores=0 enable_rc6=0 What you get in this series is what I'm going to coin, GPU mirror. This patch series allows one to allocate an arbitrary address for your GPU buffer object, and map it to a specific space within the GPUs address space. This is only possible because on Broadwell we get a 64b canonical GPU address space, and this allows us to map any CPU address as a GPU address. The obvious usage here is malloc(). malloc() returns a pointer that is valid on the CPU. Now that address can be identical on the GPU. The interface provided is identical to the userptr interface previously posted by Chris Wilson. I've added a flag to that interface that indicates this new functionality. This is not necessarily the final version, and it's arguably not the best idea either. The reason for this choice is we had users of userptr that wanted to try out this concept and not have to do much porting. To get to the userptr interface, I had to make a few things happen first. I needed to get dynamic page table allocation and teardown working. This was posted previously for gen6-7 [1] (with very rough code for gen8). I've now added more robust support for gen8 dynamic page table allocations. Doing the allocations dynamically was important because preallocating all 4 levels of page tables is not feasible in a real system. 4 level page tables are required in order to be able to support the 64b canonical address space. With that all done, I was able to make a few minor hacks to userptr, take the intel-gpu-tools test from Tvrtko, and see at least one pass. FWIW, I am currently running, ./tests/gem_userptr_blits --run-subtest coherency-unsync Since I feel the interface will likely change, I do not feel compelled to post either my libdrm, not my IGT changes. If you want the modified test, let me know, as I don't think it's really relevant here. One last thing. Intel GPU tools, as it stands today, makes a lot of assumptions about using an address space > 32b. I have not had time to fix this. It is something which needs fixing before this series could even be considered testable. [1] http://lists.freedesktop.org/archives/intel-gfx/2014-March/041814.html Ben Widawsky (54): drm/i915: Fix flush before context switch comment Revert "drm/i915: Drop I915_PARAM_HAS_FULL_PPGTT again" drm/i915: Wrap VMA binding drm/i915: Make pin global flags explicit drm/i915: Split out aliasing binds drm/i915: fix gtt_total_entries() drm/i915: Rename to GEN8_LEGACY_PDPES drm/i915: Split out verbose PPGTT dumping drm/i915: s/pd/pdpe, s/pt/pde drm/i915: rename map/unmap to dma_map/unmap drm/i915: Setup less PPGTT on failed pagedir drm/i915: clean up PPGTT init error path drm/i915: Un-hardcode number of page directories drm/i915: Make gen6_write_pdes gen6_map_page_tables drm/i915: Range clearing is PPGTT agnostic drm/i915: Page table helpers, and define renames drm/i915: construct page table abstractions drm/i915: Complete page table structures drm/i915: Create page table allocators drm/i915: Generalize GEN6 mapping drm/i915: Clean up pagetable DMA map & unmap drm/i915: Always dma map page table allocations drm/i915: Consolidate dma mappings drm/i915: Always dma map page directory allocations drm/i915: Track GEN6 page table usage drm/i915: Extract context switch skip logic drm/i915: Force pd restore when PDEs change, gen6-7 drm/i915: Finish gen6/7 dynamic page table allocation drm/i915/bdw: Use dynamic allocation idioms on free drm/i915/bdw: pagedirs rework allocation drm/i915/bdw: pagetable allocation rework drm/i915/bdw: Make the pdp switch a bit less hacky drm/i915: num_pd_pages/num_pd_entries isn't useful drm/i915: Extract PPGTT param from pagedir alloc drm/i915/bdw: Split out mappings drm/i915/bdw: begin bitmap tracking drm/i915/bdw: Dynamic page table allocations drm/i915/bdw: Scratch unused pages drm/i915/bdw: Add ppgtt info for dynamic pages drm/i915/bdw: Optimize PDP loads TESTME: Either drop the last patch or fix it. drm/i915/bdw: Add dynamic page trace events drm/i915/bdw: Make pdp allocation more dynamic drm/i915/bdw: Abstract PDP usage drm/i915/bdw: implement alloc/teardown for 4lvl drm/i915/bdw: 4 level pages tables drm/i915: Restructure map vs. insert entries drm/i915/bdw: make aliasing PPGTT dynamic drm/i915: Expand error state's address width to 64b drm/i915/bdw: Flip the 48b switch TESTME: GFX_TLB_INVALIDATE_EXPLICIT TESTME: Always force invalidate drm/i915: Track userptr VMAs drm/i915/userptr: Mirror GPU addr at ioctl (HACK/POC) Chris Wilson (2): drm/i915: Prevent signals from interrupting close() drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl drivers/gpu/drm/i915/Kconfig | 1 + drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/i915_debugfs.c | 112 +- drivers/gpu/drm/i915/i915_dma.c | 15 +- drivers/gpu/drm/i915/i915_drv.h | 40 +- drivers/gpu/drm/i915/i915_gem.c | 61 +- drivers/gpu/drm/i915/i915_gem_context.c | 31 +- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 5 + drivers/gpu/drm/i915/i915_gem_execbuffer.c | 22 +- drivers/gpu/drm/i915/i915_gem_gtt.c | 1810 +++++++++++++++++++++------- drivers/gpu/drm/i915/i915_gem_gtt.h | 354 +++++- drivers/gpu/drm/i915/i915_gem_userptr.c | 767 ++++++++++++ drivers/gpu/drm/i915/i915_gpu_error.c | 21 +- drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/i915_trace.h | 140 +++ drivers/gpu/drm/i915/intel_ringbuffer.c | 2 +- include/uapi/drm/i915_drm.h | 20 + 17 files changed, 2823 insertions(+), 580 deletions(-) create mode 100644 drivers/gpu/drm/i915/i915_gem_userptr.c -- 1.9.2 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx