Re: [Mesa-dev] [PATCH v4 1/2] intel: 48b ppgtt support (EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag)

Daniel Vetter <daniel@xxxxxxxx> · Wed, 14 Oct 2015 09:19:47 +0200

On Tue, Oct 13, 2015 at 02:51:36PM -0700, Kristian Høgsberg wrote:
> On Tue, Oct 13, 2015 at 7:55 AM, Michel Thierry
> <michel.thierry@xxxxxxxxx> wrote:
> > On 10/13/2015 3:13 PM, Emil Velikov wrote:
> >>
> >> On 13 October 2015 at 13:16, Michel Thierry <michel.thierry@xxxxxxxxx>
> >> wrote:
> >>>
> >>> On 10/6/2015 2:12 PM, Michel Thierry wrote:
> >>>>
> >>>>
> >>>> On 10/5/2015 7:06 PM, Kristian Høgsberg wrote:
> >>>>>
> >>>>>
> >>>>> On Mon, Oct 5, 2015 at 7:03 AM, Michel Thierry
> >>>>> <michel.thierry@xxxxxxxxx> wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 9/14/2015 2:54 PM, Michał Winiarski wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thu, Sep 03, 2015 at 03:23:58PM +0100, Michel Thierry wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Gen8+ supports 48-bit virtual addresses, but some objects must
> >>>>>>>> always be
> >>>>>>>> allocated inside the 32-bit address range.
> >>>>>>>>
> >>>>>>>> In specific, any resource used with flat/heapless
> >>>>>>>> (0x00000000-0xfffff000)
> >>>>>>>> General State Heap (GSH) or Instruction State Heap (ISH) must be in
> >>>>>>>> a
> >>>>>>>> 32-bit range, because the General State Offset and Instruction State
> >>>>>>>> Offset
> >>>>>>>> are limited to 32-bits.
> >>>>>>>>
> >>>>>>>> The i915 driver has been modified to provide a flag to set when the
> >>>>>>>> 4GB
> >>>>>>>> limit is not necessary in a given bo
> >>>>>>>> (EXEC_OBJECT_SUPPORTS_48B_ADDRESS).
> >>>>>>>> 48-bit range will only be used when explicitly requested.
> >>>>>>>>
> >>>>>>>> Callers to the existing drm_intel_bo_emit_reloc function should set
> >>>>>>>> the
> >>>>>>>> use_48b_address_range flag beforehand, in order to use full ppgtt
> >>>>>>>> range.
> >>>>>>>>
> >>>>>>>> v2: Make set/clear functions nops on pre-gen8 platforms, and use
> >>>>>>>> them
> >>>>>>>>        internally in emit_reloc functions (Ben)
> >>>>>>>>        s/48BADDRESS/48B_ADDRESS/ (Dave)
> >>>>>>>> v3: Keep set/clear functions internal, no-one needs to use them
> >>>>>>>> directly.
> >>>>>>>> v4: Don't set 48bit-support flag in emit reloc, check for ppgtt type
> >>>>>>>>        before enabling set/clear function, print full offsets in
> >>>>>>>> debug
> >>>>>>>>        statements, using port of lower_32_bits and upper_32_bits
> >>>>>>>> from linux
> >>>>>>>>        kernel (Michał)
> >>>>>>>>
> >>>>>>>> References:
> >>>>>>>>
> >>>>>>>> http://lists.freedesktop.org/archives/intel-gfx/2015-July/072612.html
> >>>>>>>> Cc: Ben Widawsky <ben@xxxxxxxxxxxx>
> >>>>>>>> Cc: Michał Winiarski <michal.winiarski@xxxxxxxxx>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> +Kristian
> >>>>>>>
> >>>>>>> LGTM.
> >>>>>>> Acked-by: Michał Winiarski <michal.winiarski@xxxxxxxxx>
> >>>>>>>
> >>>>>>>> Signed-off-by: Michel Thierry <michel.thierry@xxxxxxxxx>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Hi Kristian,
> >>>>>>
> >>>>>> More comments on this?
> >>>>>> I've resent the mesa patch with the last changes
> >>>>>>
> >>>>>>
> >>>>>> (http://lists.freedesktop.org/archives/dri-devel/2015-October/091752.html).
> >>>>>>
> >>>>>>
> >>>>>> Michał has agreed with the interface and will use it for his work.
> >>>>>> If mesa doesn't plan to use this for now, it's ok. The kernel changes
> >>>>>> have
> >>>>>> been done to prevent any regressions when the support 48-bit flag is
> >>>>>> not
> >>>>>> set.
> >>>>>
> >>>>>
> >>>>>
> >>>>> I didn't get any replies to my last comments on this:
> >>>>>
> >>>>> http://lists.freedesktop.org/archives/mesa-dev/2015-August/091398.html
> >>>>>
> >>>>> Basically, I tried explicitly placing buffers above 8G and didn't see
> >>>>> the HW problem described (ie it all worked fine).  I still think
> >>>>> there's some confusion as to what the W/A is about.
> >>>>>
> >>>>> Here's my take: the W/A is a SW-only workaround to handle the cases
> >>>>> where a driver uses heapless and 48-bit ppgtt. The problem is that the
> >>>>> heaps are limited to 4G but with 48bit ppgtt a buffer can be placed
> >>>>> anywhere it the 48 bit address space. If that happens it's consideredd
> >>>>> out-of-bounds for the heap and access fails. To prevent this we need
> >>>>> to make sure the bos we address in a heapless fashion are placed below
> >>>>> 4g.
> >>>>>
> >>>>> For mesa, we only configure general state base as heap-less, which
> >>>>> affects scratch bos. What this boils down to is that we need the 4G
> >>>>> restriction only for the scratch bos set up on 3DSTATE_VS, 3DSTATE_GS
> >>>>> and 3DSTATE_PS (and tesselation stage eventually). Look for the
> >>>>> OUT_RELOC64 for stage->scratch_bo in gen8_vs_state.c, gen8_gs_state.c
> >>>>> and gen8_ps_state.c
> >>>>
> >>>>
> >>>>
> >>>> I think it also affects _3DSTATE_VIEWPORT_STATE_POINTERS_CC, maybe it
> >>>> isn't exclusive to the scratch bos (the error state points to that
> >>>> instruction).
> >>>>
> >>>>
> >>>
> >>> Hi,
> >>>
> >>> Anymore inputs about this patch?
> >>> AFAIK, if changes are required based on further comments from Kristian,
> >>> these will be in the mesa patch[1], not libdrm. Also, Michał will use
> >>> this
> >>> flag in another project.
> >>>
> >> The comment seems quite brief and I'm not sure it fully addresses
> >> Kristian's concern. I'd suspect that providing reference to the HW
> >> documentation (confirming your assumption) might be beneficial.
> >>
> >
> > Sure, attached is the hang I get if I don't set the restriction in
> > gen8_misc_state.c and try to use the full 48-bit range
> > (i915_error_state_nowa.txt). This is just running gfxbench t-rex.
> >
> > I see the same hang signature when it is only applied to the scratch bos (in
> > gen8_vs_state.c, gen8_gs_state.c and gen8_ps_state.c -
> > i915_error_state_scratchbo.txt).
> >
> > 3DSTATE_VIEWPORT_STATE_POINTERS_CC (0x78230000) is defined here:
> > https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-bdw-vol02a-commandreference-instructions_0.pdf
> > (page 256)
> 
> I applied your mesa and libdrm patches and then backed out the 4G
> restriction in the STATE_BASE_ADDRESS relocations.  I was not able to
> reproduce any hang with trex or glxgears. I confirmed (using
> INTEL_DEBUG=bat) that gem actually placed BOs above the 4G mark and
> got this:
> 
>  0: 8 @ 0xfffffff90000 (miptree)
>  1: 9 @ 0xfffffff78000 (hiz)
>  2: 4 @ 0xfffffff77000 (pipe_control workaround)
>  3: 5 @ 0xfffffff76000 (program cache)
>  4: 12 @ 0x000000000000 (miptree)
>  5: 7 @ 0x000000001000 (image)
>  6: 10 @ 0xfffffff5a000 (miptree)
>  7: 13 @ 0xfffffff59000 (bufferobj)
>  8: 11 @ 0xfffffff51000 (bufferobj)
>  9: 14 @ 0xfffffff50000 (bufferobj)
> 10: 15 @ 0xfffffff40000 (batchbuffer)@0x00000000 00000020 -> 8
> (miptree)@0x0000ffff fff90000 + 0x00000000
> 10: 15 @ 0xfffffff40000 (batchbuffer)@0x00000000 00000040 -> 9
> (hiz)@0x0000ffff fff78000 + 0x00000000
> 10: 15 @ 0xfffffff40000 (batchbuffer)@0x00000000 00000098 -> 4
> (pipe_control workaround)@0x0000ffff fff77000 + 0x00000000
> ...
> 
> In particular, the mesa batchbuffer is placed at 0xfffffff40000, and
> all the STATE_BASE_ADDRESS heaps are in that bo. In other words, all
> heap bases are set to 0xfffffff40000.
> 
> The offset in 3DSTATE_VIEWPORT_STATE_POINTERS_CC is indeed only 32
> bits, but that only affects how far from the dynamic state base
> address we can place that. In mesas case, it's always inside the batch
> buffer, which is at most 32k, so that's never a problem. If a driver
> uses dynamic state in a heapless fashion, then you need to be careful
> to place the CC viewport state below 4g.
> 
> What I was able  to confirm is that scratch buffer I/O (which we use
> for spilling) does break with 48 bit ppgtt. If you run glxgears with
> INTEL_DEBUG=spill_fs set, you can see lots of rendering artifacts as
> the 4G limit on the general state heap caps attempts to spill and fill
> from the vertex shader. Using OUT_RELOC64_INSIDE_4G() when setting up
> the scratch BOs in 3DSTATE_VS and 3DSTATE_PS fixes the problem.

Ok, so we do still need the 4G limit bit (i.e. I don't have to revert the
kernel side for lack of open-source userspace), just not at the place we
originally thought. I guess lesson learned is to actually test stuff first
before I pull in the kernel side. Michel, did you not see the spilling
corruptions when testing the mesa patches? Kristian, does this also blow
up with just a piglit run?

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel