These patches live here, based on my temporary Broadwell branch: http://cgit.freedesktop.org/~bwidawsk/drm-intel/log/?h=dynamic_pt_alloc First, and most importantly, this work should have no impact on current drm-intel code because PPGTT is currently shut off there. To actually test this patch series, one must re-enable PPGTT. On a single run of IGT on IVB, it seem this doesn't introduce any regressions, but y'know, it's PPGTT, so there's some instability, and it's hard to claim for certain this doesn't break anything on top. Also, as stated below, the gen8 work is only partially done. Before I go too much further with this, I wanted to get eyes on it. I am really open to any feedback. Before you do request a change though, please realize that I've gone through several iterations of the functions/interfaces. So please, spare me some pain and try to think through what your request is before rattling it off. Daniel has expressed to me already that he is unwilling to merge certain things until PPGTT problems are fixed, and that can be enabled by default. That's okay. In my opinion, many of the patches don't really have any major behavioral changes, and only make the code so much more readable and easy to deal with, that I believe merging it would only improve PPGTT debugging in the future. There are several cleanups in the series which could also go in relatively harmlessly. Okay, so what does this do? The patch series /dynamicizes/ page table allocation and teardown for GEN7. It also starts to introduce GEN8, but the tricky stuff is still not done. Up until now, all our page tables are pre-allocated when the address space is created. That's actually okay for current GENs since we don't use many address spaces, and the page tables occupy only 2MB each. However, on GEN8 we can use a deeper page table, and to preallocate such an address space would be very costly. This work was done for GEN7 first because this is the most well tested with full PPGTT, and stable platforms are readily available. In this patch series, I've demonstrated how we will manage tracking used page tables (bitmaps), and broken things out into much more discrete functions. I'm hoping I'll get feedback on the way I've implemented things (primarily if it seems fundamentally flawed in any way). The real goal was to prove out the dynamic allocation so we can begin to enable GEN8 in the same way. I'll emphasize now that I put in a lot of effort limit risk with each patch, and this does result in some excess churn. My next step is bring GEN8 up to par with GEN7. Once GEN8 is working, and clean we can find where GEN7, and GEN8 overlap, and then recombine where I haven't done so already. It's possible this plan will not work out, and the above 2 steps will end up as one. After that, I plan to merge the VA range allocation, and teardown into the insert/clear entries (currently it's two steps). I think both of those steps should be distinct. On x86 code overlap: I spent more time that I would have liked trying to conjoin our pagetable management with x86 code. In the end I decided not to depend on any of the x86 definitions (other than PAGE_SIZE) because I found the maze of conditional compiles and defines a bit too cumbersome. I also didn't feel the abstract pagetable topology used in x86 code was worthwhile given that with about 6 #defines, we achieve the same thing. We just don't support nearly as many configurations, and our page table format differs in too many places. One thing I had really considered, and toyed around with was not having data structures to track the page tables we've allocated and simply use the one that's in memory (which is what x86 does). I was not able to make this work because of IOMMU. The address we write into our page tables is an IOMMU address. This means we need to know, or be able to easily derive both the physical address (or pfn, or struct page), and the DMA address. I failed to accomplish this. I think using the bitmaps should be a fast way than having to kmap the pagetables to determine their status anyway. And, one thing to keep in mind is currently we don't have any GPU faulting capability. This will greatly limit the ability to map things sparsely, which also will greatly limit the effective virtual address space we can use. Ben Widawsky (26): drm/i915: Split out verbose PPGTT dumping drm/i915: Extract switch to default context drm/i915: s/pd/pdpe, s/pt/pde drm/i915: rename map/unmap to dma_map/unmap drm/i915: Setup less PPGTT on failed pagedir drm/i915: Wrap VMA binding drm/i915: clean up PPGTT init error path drm/i915: Un-hardcode number of page directories drm/i915: Split out gtt specific header file drm/i915: Make gen6_write_pdes gen6_map_page_tables drm/i915: Range clearing is PPGTT agnostic drm/i915: Page table helpers, and define renames drm/i915: construct page table abstractions drm/i915: Complete page table structures drm/i915: Create page table allocators drm/i915: Generalize GEN6 mapping drm/i915: Clean up pagetable DMA map & unmap drm/i915: Always dma map page table allocations drm/i915: Consolidate dma mappings drm/i915: Always dma map page directory allocations drm/i915: Track GEN6 page table usage drm/i915: Extract context switch skip logic drm/i915: Force pd restore when PDEs change, gen6-7 drm/i915: Finish gen6/7 dynamic page table allocation drm/i915: Print used ppgtt pages for gen6 in debugfs FOR REFERENCE ONLY drivers/gpu/drm/i915/i915_debugfs.c | 47 +- drivers/gpu/drm/i915/i915_drv.h | 169 +---- drivers/gpu/drm/i915/i915_gem.c | 10 +- drivers/gpu/drm/i915/i915_gem_context.c | 25 +- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 10 +- drivers/gpu/drm/i915/i915_gem_gtt.c | 995 +++++++++++++++++------------ drivers/gpu/drm/i915/i915_gem_gtt.h | 417 ++++++++++++ drivers/gpu/drm/i915/i915_gpu_error.c | 1 - drivers/gpu/drm/i915/i915_trace.h | 108 ++++ 9 files changed, 1198 insertions(+), 584 deletions(-) create mode 100644 drivers/gpu/drm/i915/i915_gem_gtt.h -- 1.9.0 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx