A bunch of miscellaneous fixes for assertion failures and various performance regressions when mixing new methods for offloads, along with a couple of improvements for rendering with gen4. * Remove use of packed unnormalized texture coordinates on gen4/5 as these GPUs do not support unnormalized coordinates in the sampler. * Remove dependency upon x86 asm for cross-building to unsupported architectures. https://bugs.gentoo.org/show_bug.cgi?id=448570 * Apply damage around PRIME updates in the correct order. * Correctly read the initial backlight level for when the user overrides UXA's choice of backlight controller. * Throttle UXA and prevent it queuing work much faster than the GPU can complete it. This realised itself in impossible performance figures and the entire display freezing for several seconds whlist the GPU caught up. One side effect is that it also caused the DDX to consume more memory than was required as it could not recycle buffers quick enough, and in some cases this produces a marked improvement in performance. Also note on gen2/3 this requires a new libdrm [2.4.41] in order to prevent a bug causing the DDX to fallback to swrast. Chris Wilson (98): sna: Explicitly track self-relocation entries sna/gen6+: Tweak to only consider active ring on destination sna: Do not try and set a 0x0 mode sna/gen6+: Tidy up ring preferences sna/gen2,3: Remove gen-specific vertex_offset sna: Sanity check config->compat_output sna: Skip copying fbcon if we are already on the scanout sna: Mark kgem_bo_retire() as static sna: Only allocate a busy CPU bo for a GPU readback sna/gen4+: Tidy emit_spans_solid() sna/gen4+: Tidy emit_spans_affine() sna/gen4+: Trim an extraneous coordinate from solid span emission sna/gen4+: Trim an extraneous coordinate from solid composite emission sna/gen4+: Check for a spare exec slot for an outstanding vbo sna/gen3: Use inline transform+scale function sna: DBG compile fixes sna: Allow a flush to occur before batching a flush-bo sna: Move the primary color cache into the alpha cache sna/gen4+: Try using the BLT before doing a tiled copy sna/dri: Gracefully handle failures from pageflip sna: Seed the solid color cache with an invalid value to prevent false hits uxa: Align surface allocations to even tile rows sna/dri: Fix triple buffering to not penalise missed frames sna/dri: Use the default choice of backend for copying the region sna/gen6+: Hint that we prefer to use the BLT with uncached scanouts sna/gen4: Tweak single-thread SF w/a for solids sna/gen2: Tidy a pair of vertex emitters sna/gen2: Always try to use the BLT pipeline first sna: Tidy compat interfaces sna: Remove some obsolete Options sna: Micro-optimise glyph_valid() sna: Fast path inplace addition of solid trapezoids sna/gen6+: Remove vestigial CC viewport state sna/gen4+: Tidy special handling of 2s2s vertex elements sna/gen2+: Precompute the affine transformation scale factors sna/gen4+: Specialise linear vertex emission sna/gen6+: Fine tune placement of DRI copies sna: Fix off-by-one in C version of fls sna: Also recognise __i386__ for fls asm sna: Add a pair of asserts to validate fls()/cache_bucket() sna: Convert allocation request from bytes to num_pages when shrinking sna: Flush the batch prior to referencing work from another ring sna: Embed the pre-allocation of the static request into the device sna: Clear up the caches after handling a request allocation failure sna/trapezoids: filter out zero-length runs sna/trapezoids: filter out cancelling edges upon insertion to edge-list Revert "sna/gen4+: Backport tight vertex packing for simple renderblits" sna/gen4+: Trim the redundant float from the fill vertices sna/gen4+: Handle solids passed to the general texcoord emitter sna: Try to create userptr with the unsync'ed flag set first sna: Only disable upon a failed pageflip after at least one pipe flips sna/dri: Transfer the DRI2 reference to the new TearFree pixmap sna: fixup damage posting to be done correctly around slave pixmap sna: Open-code xf86CompatOutput() to avoid invalid pointers sna: Make sure all outputs are disabled if no CompatOutput is defined sna: With a GPU bo and a shm source, do not fall all the way back sna: Ignore the last pixmap cpu setting if overwritting all damage sna: Allow CPU bo to copy to GPU bo if the device is idle. sna: Prefer to use the GPU for copies from SHM onto tiled destinations sna: Use some surplus bits to back our temporary pixman_image_t intel: Throttle harder sna: Prefer userptr if copying to a tiled bo sna: Also prefer to use the GPU for uploads into a tiled bo sna: Disable memcpy_to_tiled_x() uploads on 32-bit systems sna: Reorder struct kgem_bo to move related data into the same cacheline sna/dri: Prefer to preserve the ring of the destination bo sna: After a size check, double check the batch before flushing sna: Tweak max object sizes to take account of aperture restrictions sna: Experiment with a CPU mapping for certain fallbacks sna: Correct a few assertions after enabling read-only mappings sna: Relax limitation on not mapping GPU bo with shadow pointers sna: Allow creation of a CPU map for pixmaps if needed sna: Allow large image uploads to utilize temporary mappings sna: Use the pixmap size (not drawable) to determine replacement sna: Add a compile flag for measuring impact of userptr uploads sna: Check size against aperture before attempting to perform the GTT mapping sna: Initialize src_bo to detect allocation failure sna: Use userptr to accelerate GetImage sna: Apply PutImage optimisations to move-to-cpu sna: Limit temporary userptr uploads to large busy targets or LLC machines sna: Tweak considering of last-cpu placement for inplace regions sna: Avoid allocating an active CPU bo unnecessarily sna: Mark uploads with async hints when appropriate sna: Free the SHM pixmaps after b266ae6f6f sna: Pass the async hint for the upload into the GPU sna: Hint that a copy from a SHM bo will likely be the last in a batch sna: Add DBG to use_shm_bo() sna/trapezoids: Avoid the multiply for an opaque source sna: Specialise sna_get_image_blt for clears to avoid sync readback sna: Assert that we never try to mix INPLACE / ASYNC hints for move-to-cpu sna: Avoid serialising on an move-to-cpu for an async operation sna: Revert use of a separate CAN_CREATE_SMALL flag sna: Add DBG for when we add the inplace hint sna: Fix computation of large object sizes to prevent overflow sna: Discard the batch if we are discarding the only buffer in it sna: Correct DBG to refer to the actual tiling mode forced sna: Restrict upload buffers to reduce sampler TLB misses 2.20.18 release Dave Airlie (2): intel: drop pointless error printf in the slave pixmap sync code. intel: fixup damage posting to be done correctly around slave pixmap Matt Turner (1): sna: Rewrite __fls without dependence upon x86 assembly Micka?l THOMAS (1): Set initial value for backlight_active_level git tag: 2.20.18 http://xorg.freedesktop.org/archive/individual/driver/xf86-video-intel-2.20.18.tar.bz2 MD5: 56457bda125912b803f488f779460640 xf86-video-intel-2.20.18.tar.bz2 SHA1: ca30ed7d84a02b3ccdb9e47ee876809e406822b3 xf86-video-intel-2.20.18.tar.bz2 SHA256: f3daedf9571b04234053507940ba0a221abfcd294c3c350ff49eaf499b8437b5 xf86-video-intel-2.20.18.tar.bz2 http://xorg.freedesktop.org/archive/individual/driver/xf86-video-intel-2.20.18.tar.gz MD5: dd1e212394f489624807495b5540924e xf86-video-intel-2.20.18.tar.gz SHA1: 4a285a94968af73e972b15972f57b85dfec58bc7 xf86-video-intel-2.20.18.tar.gz SHA256: d0ae1cd7aadd7ace87f4d702df7e583c6c331e7dc5e39b9f1e70ca21416a5978 xf86-video-intel-2.20.18.tar.gz -- Chris Wilson, Intel Open Source Technology Centre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: Digital signature URL: <http://lists.freedesktop.org/archives/intel-gfx/attachments/20130116/bd9608c8/attachment-0001.pgp>