On Thu, Jul 16, 2015 at 10:33:12AM +0100, Michel Thierry wrote: > This clean-up version delays the 48-bit work to later patches and includes > other review comments from Akash and Chris Wilson. The first 4 patches > prepare the dynamic page allocation code to handle independent pdps, but > no specific code for 48-bit mode is added before the 5th patch. > > In order expand the GPU address space, a 4th level translation is added, > the Page Map Level 4 (PML4). This PML4 has 512 PML4 Entries (PML4E), > PML4[0-511], each pointing to a PDP. All the existing "dynamic alloc > ppgtt" functions are used, only adding the 4th level changes. I also > updated some remaining variables that were 32b only. > > There are 2 hardware workarounds needed to allow correct operation with > 48b addresses (Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset). > This new patchset version includes the comments and suggestions from Chris > Wilson. A flag (EXEC_OBJECT_SUPPORTS_48B_ADDRESS) will indicate if a given > object can be allocated outside the first 4 PDPs; if not, the end range is > forced to 4GB. Also, more objects now use the DRM_MM_CREATE_TOP flag. To > maintain compatibility, in libdrm I added a new drm_intel_bo_emit_reloc_48bit > function that will flag these objects, while the existing drm_intel_bo_emit_reloc > clears it. > > Finally, this feature is only available in BDW and Gen9, requires LRC > submission mode (execlists) and it can be detected by i915.enable_ppgtt=3. > > Also note that this expanded address space is only available for full > PPGTT, aliasing PPGTT and Global GTT remain 32-bit. A test I just thought of is to extend gem_evict_alignment to iterate over for (align = 1<<12; align < 1<<48; align <<= 1) exec(obj.align=align) i.e. basically force the kernel to place the object in every power-of-two zone. The idea here is to exercise and allocate as much of the 4-level page table handling code as is trivially possible (to work on extents tracking you could leave each level in place. Now this is starting to feel more like a gem_ppgtt test). Using softpin we would move control over exercising every boundary in the code (but then requires softpin). Also noticed that constructing the bitmaps for va_alloc_range tracking was very expensive, even in the trivial no-op case (rebinding to the same location). A benchmark to measure that allocation overhead would be very useful. For that I think a synthetic like using softpin to move an object through the entire address space or even flip between two locations would do the job. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx