Re: [PATCH 02/21] drm/i915/gtt: Workaround for HW preload not flushing pdps

Michel Thierry <michel.thierry@xxxxxxxxx> · Thu, 13 Aug 2015 17:54:18 +0800

On 8/13/2015 5:36 PM, Zhiyuan Lv wrote:
Hi Dave,

On Wed, Aug 12, 2015 at 04:09:18PM +0100, Dave Gordon wrote:
On 12/08/15 08:56, Thierry, Michel wrote:
On 8/11/2015 1:05 PM, Zhiyuan Lv wrote:
Hi Mika/Dave/Michel,

I saw the patch of using LRI for root pointer update has been merged to
drm-intel. When we consider i915 driver to run inside a virtual machine, e.g.
with XenGT, we may still need Mika's this patch like below:

"
          if (intel_vgpu_active(ppgtt->base.dev))
                  gen8_preallocate_top_level_pdps(ppgtt);
"

Could you share with us your opinion? Thanks in advance!

Hi Zhiyuan,

The change looks ok to me. If you need to preallocate the PDPs,
gen8_ppgtt_init is the right place to do it. Only add a similar
vgpu_active check to disable the LRI updates (in gen8_emit_bb_start).

The reason behind is that LRI command will make shadow PPGTT implementation
hard. In XenGT, we construct shadow page table for each PPGTT in guest i915
driver, and then track every guest page table change in order to update shadow
page table accordingly. The problem of page table updates with GPU command is
that they cannot be trapped by hypervisor to finish the shadow page table
update work. In XenGT, the only change we have is the command scan in context
submission. But that is not exactly the right time to do shadow page table
update.

Mika's patch can address the problem nicely. With the preallocation, the root
pointers in EXECLIST context will always keep the same. Then we can treat any
attempt to change guest PPGTT with GPU commands as malicious behavior. Thanks!

Regards,
-Zhiyuan

The bad thing that was happening if we didn't use LRIs was that the
CPU would try to push the new mappings to the GPU by updating PDP
registers in the saved context image. This is unsafe if the context
is running, as switching away from it would result in the
CPU-updated values being overwritten by the older values in the GPU
h/w registers (if the context were known to be idle, then it would
be safe).

Thank you very much for the detailed explanation! And I am curious
that if the root pointers update does not have side effect to the
current running context, for instance, only changing NULL to PD
without modifying existing pdpes, can we use "Force PD Restore" bit in
ctx descriptor?

We've been explicitly asked to not use "Force PD Restore".


Regards,
-Zhiyuan


Preallocating the top-level PDPs should mean that the values need
never change, so there's then no need to update the context image,
thus avoiding the write hazard :)

.Dave.

On Thu, Jun 11, 2015 at 04:57:42PM +0300, Mika Kuoppala wrote:
Dave Gordon <david.s.gordon@xxxxxxxxx> writes:

On 10/06/15 12:42, Michel Thierry wrote:
On 5/29/2015 1:53 PM, Michel Thierry wrote:
On 5/29/2015 12:05 PM, Michel Thierry wrote:
On 5/22/2015 6:04 PM, Mika Kuoppala wrote:
With BDW/SKL and 32bit addressing mode only, the hardware preloads
pdps. However the TLB invalidation only has effect on levels below
the pdps. This means that if pdps change, hw might access with
stale pdp entry.

To combat this problem, preallocate the top pdps so that hw sees
them as immutable for each context.

Cc: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
Cc: Rafael Barbalho <rafael.barbalho@xxxxxxxxx>
Signed-off-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxx>
---
    drivers/gpu/drm/i915/i915_gem_gtt.c | 50
+++++++++++++++++++++++++++++++++++++
    drivers/gpu/drm/i915/i915_reg.h     | 17 +++++++++++++
    drivers/gpu/drm/i915/intel_lrc.c    | 15 +----------
    3 files changed, 68 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0ffd459..1a5ad4c 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -941,6 +941,48 @@ err_out:
           return ret;
    }

+/* With some architectures and 32bit legacy mode, hardware pre-loads
+ * the top level pdps but the tlb invalidation only invalidates the
+ * lower levels.
+ * This might lead to hw fetching with stale pdp entries if top level
+ * structure changes, ie va space grows with dynamic page tables.
+ */

Is this still necessary if we reload PDPs via LRI instructions whenever
the address map has changed? That always (AFAICT) causes sufficient
invalidation, so then we might not need to preallocate at all :)

LRI reload gets my vote. Please ignore this patch.
-Mika

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx