Re: [Mesa-dev] [PATCH v4 1/2] intel: 48b ppgtt support (EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/14/2015 8:19 AM, Daniel Vetter wrote:
On Tue, Oct 13, 2015 at 02:51:36PM -0700, Kristian Høgsberg wrote:
On Tue, Oct 13, 2015 at 7:55 AM, Michel Thierry
<michel.thierry@xxxxxxxxx> wrote:
On 10/13/2015 3:13 PM, Emil Velikov wrote:

On 13 October 2015 at 13:16, Michel Thierry <michel.thierry@xxxxxxxxx>
wrote:

On 10/6/2015 2:12 PM, Michel Thierry wrote:


On 10/5/2015 7:06 PM, Kristian Høgsberg wrote:


On Mon, Oct 5, 2015 at 7:03 AM, Michel Thierry
<michel.thierry@xxxxxxxxx> wrote:


On 9/14/2015 2:54 PM, Michał Winiarski wrote:



On Thu, Sep 03, 2015 at 03:23:58PM +0100, Michel Thierry wrote:



Gen8+ supports 48-bit virtual addresses, but some objects must
always be
allocated inside the 32-bit address range.

In specific, any resource used with flat/heapless
(0x00000000-0xfffff000)
General State Heap (GSH) or Instruction State Heap (ISH) must be in
a
32-bit range, because the General State Offset and Instruction State
Offset
are limited to 32-bits.

The i915 driver has been modified to provide a flag to set when the
4GB
limit is not necessary in a given bo
(EXEC_OBJECT_SUPPORTS_48B_ADDRESS).
48-bit range will only be used when explicitly requested.

Callers to the existing drm_intel_bo_emit_reloc function should set
the
use_48b_address_range flag beforehand, in order to use full ppgtt
range.

v2: Make set/clear functions nops on pre-gen8 platforms, and use
them
        internally in emit_reloc functions (Ben)
        s/48BADDRESS/48B_ADDRESS/ (Dave)
v3: Keep set/clear functions internal, no-one needs to use them
directly.
v4: Don't set 48bit-support flag in emit reloc, check for ppgtt type
        before enabling set/clear function, print full offsets in
debug
        statements, using port of lower_32_bits and upper_32_bits
from linux
        kernel (Michał)

References:

http://lists.freedesktop.org/archives/intel-gfx/2015-July/072612.html
Cc: Ben Widawsky <ben@xxxxxxxxxxxx>
Cc: Michał Winiarski <michal.winiarski@xxxxxxxxx>




+Kristian

LGTM.
Acked-by: Michał Winiarski <michal.winiarski@xxxxxxxxx>

Signed-off-by: Michel Thierry <michel.thierry@xxxxxxxxx>





Hi Kristian,

More comments on this?
I've resent the mesa patch with the last changes


(http://lists.freedesktop.org/archives/dri-devel/2015-October/091752.html).


Michał has agreed with the interface and will use it for his work.
If mesa doesn't plan to use this for now, it's ok. The kernel changes
have
been done to prevent any regressions when the support 48-bit flag is
not
set.



I didn't get any replies to my last comments on this:

http://lists.freedesktop.org/archives/mesa-dev/2015-August/091398.html

Basically, I tried explicitly placing buffers above 8G and didn't see
the HW problem described (ie it all worked fine).  I still think
there's some confusion as to what the W/A is about.

Here's my take: the W/A is a SW-only workaround to handle the cases
where a driver uses heapless and 48-bit ppgtt. The problem is that the
heaps are limited to 4G but with 48bit ppgtt a buffer can be placed
anywhere it the 48 bit address space. If that happens it's consideredd
out-of-bounds for the heap and access fails. To prevent this we need
to make sure the bos we address in a heapless fashion are placed below
4g.

For mesa, we only configure general state base as heap-less, which
affects scratch bos. What this boils down to is that we need the 4G
restriction only for the scratch bos set up on 3DSTATE_VS, 3DSTATE_GS
and 3DSTATE_PS (and tesselation stage eventually). Look for the
OUT_RELOC64 for stage->scratch_bo in gen8_vs_state.c, gen8_gs_state.c
and gen8_ps_state.c



I think it also affects _3DSTATE_VIEWPORT_STATE_POINTERS_CC, maybe it
isn't exclusive to the scratch bos (the error state points to that
instruction).



Hi,

Anymore inputs about this patch?
AFAIK, if changes are required based on further comments from Kristian,
these will be in the mesa patch[1], not libdrm. Also, Michał will use
this
flag in another project.

The comment seems quite brief and I'm not sure it fully addresses
Kristian's concern. I'd suspect that providing reference to the HW
documentation (confirming your assumption) might be beneficial.


Sure, attached is the hang I get if I don't set the restriction in
gen8_misc_state.c and try to use the full 48-bit range
(i915_error_state_nowa.txt). This is just running gfxbench t-rex.

I see the same hang signature when it is only applied to the scratch bos (in
gen8_vs_state.c, gen8_gs_state.c and gen8_ps_state.c -
i915_error_state_scratchbo.txt).

3DSTATE_VIEWPORT_STATE_POINTERS_CC (0x78230000) is defined here:
https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-bdw-vol02a-commandreference-instructions_0.pdf
(page 256)

I applied your mesa and libdrm patches and then backed out the 4G
restriction in the STATE_BASE_ADDRESS relocations.  I was not able to
reproduce any hang with trex or glxgears. I confirmed (using
INTEL_DEBUG=bat) that gem actually placed BOs above the 4G mark and
got this:

  0: 8 @ 0xfffffff90000 (miptree)
  1: 9 @ 0xfffffff78000 (hiz)
  2: 4 @ 0xfffffff77000 (pipe_control workaround)
  3: 5 @ 0xfffffff76000 (program cache)
  4: 12 @ 0x000000000000 (miptree)
  5: 7 @ 0x000000001000 (image)
  6: 10 @ 0xfffffff5a000 (miptree)
  7: 13 @ 0xfffffff59000 (bufferobj)
  8: 11 @ 0xfffffff51000 (bufferobj)
  9: 14 @ 0xfffffff50000 (bufferobj)
10: 15 @ 0xfffffff40000 (batchbuffer)@0x00000000 00000020 -> 8
(miptree)@0x0000ffff fff90000 + 0x00000000
10: 15 @ 0xfffffff40000 (batchbuffer)@0x00000000 00000040 -> 9
(hiz)@0x0000ffff fff78000 + 0x00000000
10: 15 @ 0xfffffff40000 (batchbuffer)@0x00000000 00000098 -> 4
(pipe_control workaround)@0x0000ffff fff77000 + 0x00000000
...

In particular, the mesa batchbuffer is placed at 0xfffffff40000, and
all the STATE_BASE_ADDRESS heaps are in that bo. In other words, all
heap bases are set to 0xfffffff40000.

The offset in 3DSTATE_VIEWPORT_STATE_POINTERS_CC is indeed only 32
bits, but that only affects how far from the dynamic state base
address we can place that. In mesas case, it's always inside the batch
buffer, which is at most 32k, so that's never a problem. If a driver
uses dynamic state in a heapless fashion, then you need to be careful
to place the CC viewport state below 4g.

What I was able  to confirm is that scratch buffer I/O (which we use
for spilling) does break with 48 bit ppgtt. If you run glxgears with
INTEL_DEBUG=spill_fs set, you can see lots of rendering artifacts as
the 4G limit on the general state heap caps attempts to spill and fill
from the vertex shader. Using OUT_RELOC64_INSIDE_4G() when setting up
the scratch BOs in 3DSTATE_VS and 3DSTATE_PS fixes the problem.

Ok, so we do still need the 4G limit bit (i.e. I don't have to revert the
kernel side for lack of open-source userspace), just not at the place we
originally thought. I guess lesson learned is to actually test stuff first
before I pull in the kernel side. Michel, did you not see the spilling
corruptions when testing the mesa patches? Kristian, does this also blow
up with just a piglit run?


There's must be something else we have different in our systems.

Kristian doesn't get the hang, and if I back out the 4G restriction in the STATE_BASE_ADDRESS, I don't see any corruption while running
INTEL_DEBUG=spill_fs glxgears.

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux