On Fri, Dec 4, 2015 at 6:24 AM, Michel Thierry <michel.thierry@xxxxxxxxx> wrote: > On 11/18/2015 10:53 PM, Kristian Høgsberg wrote: >> >> On Wed, Oct 14, 2015 at 5:11 AM, Michel Thierry >> <michel.thierry@xxxxxxxxx> wrote: >>> >>> On 10/14/2015 8:19 AM, Daniel Vetter wrote: >>>> >>>> >>>> On Tue, Oct 13, 2015 at 02:51:36PM -0700, Kristian Høgsberg wrote: >>>>> >>>>> >>>>> On Tue, Oct 13, 2015 at 7:55 AM, Michel Thierry >>>>> <michel.thierry@xxxxxxxxx> wrote: >>>>>> >>>>>> >>>>>> On 10/13/2015 3:13 PM, Emil Velikov wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 13 October 2015 at 13:16, Michel Thierry >>>>>>> <michel.thierry@xxxxxxxxx> >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 10/6/2015 2:12 PM, Michel Thierry wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 10/5/2015 7:06 PM, Kristian Høgsberg wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Oct 5, 2015 at 7:03 AM, Michel Thierry >>>>>>>>>> <michel.thierry@xxxxxxxxx> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 9/14/2015 2:54 PM, Michał Winiarski wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Sep 03, 2015 at 03:23:58PM +0100, Michel Thierry wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Gen8+ supports 48-bit virtual addresses, but some objects must >>>>>>>>>>>>> always be >>>>>>>>>>>>> allocated inside the 32-bit address range. >>>>>>>>>>>>> >>>>>>>>>>>>> In specific, any resource used with flat/heapless >>>>>>>>>>>>> (0x00000000-0xfffff000) >>>>>>>>>>>>> General State Heap (GSH) or Instruction State Heap (ISH) must >>>>>>>>>>>>> be >>>>>>>>>>>>> in >>>>>>>>>>>>> a >>>>>>>>>>>>> 32-bit range, because the General State Offset and Instruction >>>>>>>>>>>>> State >>>>>>>>>>>>> Offset >>>>>>>>>>>>> are limited to 32-bits. >>>>>>>>>>>>> >>>>>>>>>>>>> The i915 driver has been modified to provide a flag to set when >>>>>>>>>>>>> the >>>>>>>>>>>>> 4GB >>>>>>>>>>>>> limit is not necessary in a given bo >>>>>>>>>>>>> (EXEC_OBJECT_SUPPORTS_48B_ADDRESS). >>>>>>>>>>>>> 48-bit range will only be used when explicitly requested. >>>>>>>>>>>>> >>>>>>>>>>>>> Callers to the existing drm_intel_bo_emit_reloc function should >>>>>>>>>>>>> set >>>>>>>>>>>>> the >>>>>>>>>>>>> use_48b_address_range flag beforehand, in order to use full >>>>>>>>>>>>> ppgtt >>>>>>>>>>>>> range. >>>>>>>>>>>>> >>>>>>>>>>>>> v2: Make set/clear functions nops on pre-gen8 platforms, and >>>>>>>>>>>>> use >>>>>>>>>>>>> them >>>>>>>>>>>>> internally in emit_reloc functions (Ben) >>>>>>>>>>>>> s/48BADDRESS/48B_ADDRESS/ (Dave) >>>>>>>>>>>>> v3: Keep set/clear functions internal, no-one needs to use them >>>>>>>>>>>>> directly. >>>>>>>>>>>>> v4: Don't set 48bit-support flag in emit reloc, check for ppgtt >>>>>>>>>>>>> type >>>>>>>>>>>>> before enabling set/clear function, print full offsets >>>>>>>>>>>>> in >>>>>>>>>>>>> debug >>>>>>>>>>>>> statements, using port of lower_32_bits and >>>>>>>>>>>>> upper_32_bits >>>>>>>>>>>>> from linux >>>>>>>>>>>>> kernel (Michał) >>>>>>>>>>>>> >>>>>>>>>>>>> References: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> http://lists.freedesktop.org/archives/intel-gfx/2015-July/072612.html >>>>>>>>>>>>> Cc: Ben Widawsky <ben@xxxxxxxxxxxx> >>>>>>>>>>>>> Cc: Michał Winiarski <michal.winiarski@xxxxxxxxx> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> +Kristian >>>>>>>>>>>> >>>>>>>>>>>> LGTM. >>>>>>>>>>>> Acked-by: Michał Winiarski <michal.winiarski@xxxxxxxxx> >>>>>>>>>>>> >>>>>>>>>>>>> Signed-off-by: Michel Thierry <michel.thierry@xxxxxxxxx> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Kristian, >>>>>>>>>>> >>>>>>>>>>> More comments on this? >>>>>>>>>>> I've resent the mesa patch with the last changes >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> (http://lists.freedesktop.org/archives/dri-devel/2015-October/091752.html). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Michał has agreed with the interface and will use it for his >>>>>>>>>>> work. >>>>>>>>>>> If mesa doesn't plan to use this for now, it's ok. The kernel >>>>>>>>>>> changes >>>>>>>>>>> have >>>>>>>>>>> been done to prevent any regressions when the support 48-bit flag >>>>>>>>>>> is >>>>>>>>>>> not >>>>>>>>>>> set. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I didn't get any replies to my last comments on this: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> http://lists.freedesktop.org/archives/mesa-dev/2015-August/091398.html >>>>>>>>>> >>>>>>>>>> Basically, I tried explicitly placing buffers above 8G and didn't >>>>>>>>>> see >>>>>>>>>> the HW problem described (ie it all worked fine). I still think >>>>>>>>>> there's some confusion as to what the W/A is about. >>>>>>>>>> >>>>>>>>>> Here's my take: the W/A is a SW-only workaround to handle the >>>>>>>>>> cases >>>>>>>>>> where a driver uses heapless and 48-bit ppgtt. The problem is that >>>>>>>>>> the >>>>>>>>>> heaps are limited to 4G but with 48bit ppgtt a buffer can be >>>>>>>>>> placed >>>>>>>>>> anywhere it the 48 bit address space. If that happens it's >>>>>>>>>> consideredd >>>>>>>>>> out-of-bounds for the heap and access fails. To prevent this we >>>>>>>>>> need >>>>>>>>>> to make sure the bos we address in a heapless fashion are placed >>>>>>>>>> below >>>>>>>>>> 4g. >>>>>>>>>> >>>>>>>>>> For mesa, we only configure general state base as heap-less, which >>>>>>>>>> affects scratch bos. What this boils down to is that we need the >>>>>>>>>> 4G >>>>>>>>>> restriction only for the scratch bos set up on 3DSTATE_VS, >>>>>>>>>> 3DSTATE_GS >>>>>>>>>> and 3DSTATE_PS (and tesselation stage eventually). Look for the >>>>>>>>>> OUT_RELOC64 for stage->scratch_bo in gen8_vs_state.c, >>>>>>>>>> gen8_gs_state.c >>>>>>>>>> and gen8_ps_state.c >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I think it also affects _3DSTATE_VIEWPORT_STATE_POINTERS_CC, maybe >>>>>>>>> it >>>>>>>>> isn't exclusive to the scratch bos (the error state points to that >>>>>>>>> instruction). >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Anymore inputs about this patch? >>>>>>>> AFAIK, if changes are required based on further comments from >>>>>>>> Kristian, >>>>>>>> these will be in the mesa patch[1], not libdrm. Also, Michał will >>>>>>>> use >>>>>>>> this >>>>>>>> flag in another project. >>>>>>>> >>>>>>> The comment seems quite brief and I'm not sure it fully addresses >>>>>>> Kristian's concern. I'd suspect that providing reference to the HW >>>>>>> documentation (confirming your assumption) might be beneficial. >>>>>>> >>>>>> >>>>>> Sure, attached is the hang I get if I don't set the restriction in >>>>>> gen8_misc_state.c and try to use the full 48-bit range >>>>>> (i915_error_state_nowa.txt). This is just running gfxbench t-rex. >>>>>> >>>>>> I see the same hang signature when it is only applied to the scratch >>>>>> bos >>>>>> (in >>>>>> gen8_vs_state.c, gen8_gs_state.c and gen8_ps_state.c - >>>>>> i915_error_state_scratchbo.txt). >>>>>> >>>>>> 3DSTATE_VIEWPORT_STATE_POINTERS_CC (0x78230000) is defined here: >>>>>> >>>>>> >>>>>> https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-bdw-vol02a-commandreference-instructions_0.pdf >>>>>> (page 256) >>>>> >>>>> >>>>> >>>>> I applied your mesa and libdrm patches and then backed out the 4G >>>>> restriction in the STATE_BASE_ADDRESS relocations. I was not able to >>>>> reproduce any hang with trex or glxgears. I confirmed (using >>>>> INTEL_DEBUG=bat) that gem actually placed BOs above the 4G mark and >>>>> got this: >>>>> >>>>> 0: 8 @ 0xfffffff90000 (miptree) >>>>> 1: 9 @ 0xfffffff78000 (hiz) >>>>> 2: 4 @ 0xfffffff77000 (pipe_control workaround) >>>>> 3: 5 @ 0xfffffff76000 (program cache) >>>>> 4: 12 @ 0x000000000000 (miptree) >>>>> 5: 7 @ 0x000000001000 (image) >>>>> 6: 10 @ 0xfffffff5a000 (miptree) >>>>> 7: 13 @ 0xfffffff59000 (bufferobj) >>>>> 8: 11 @ 0xfffffff51000 (bufferobj) >>>>> 9: 14 @ 0xfffffff50000 (bufferobj) >>>>> 10: 15 @ 0xfffffff40000 (batchbuffer)@0x00000000 00000020 -> 8 >>>>> (miptree)@0x0000ffff fff90000 + 0x00000000 >>>>> 10: 15 @ 0xfffffff40000 (batchbuffer)@0x00000000 00000040 -> 9 >>>>> (hiz)@0x0000ffff fff78000 + 0x00000000 >>>>> 10: 15 @ 0xfffffff40000 (batchbuffer)@0x00000000 00000098 -> 4 >>>>> (pipe_control workaround)@0x0000ffff fff77000 + 0x00000000 >>>>> ... >>>>> >>>>> In particular, the mesa batchbuffer is placed at 0xfffffff40000, and >>>>> all the STATE_BASE_ADDRESS heaps are in that bo. In other words, all >>>>> heap bases are set to 0xfffffff40000. >>>>> >>>>> The offset in 3DSTATE_VIEWPORT_STATE_POINTERS_CC is indeed only 32 >>>>> bits, but that only affects how far from the dynamic state base >>>>> address we can place that. In mesas case, it's always inside the batch >>>>> buffer, which is at most 32k, so that's never a problem. If a driver >>>>> uses dynamic state in a heapless fashion, then you need to be careful >>>>> to place the CC viewport state below 4g. >>>>> >>>>> What I was able to confirm is that scratch buffer I/O (which we use >>>>> for spilling) does break with 48 bit ppgtt. If you run glxgears with >>>>> INTEL_DEBUG=spill_fs set, you can see lots of rendering artifacts as >>>>> the 4G limit on the general state heap caps attempts to spill and fill >>>>> from the vertex shader. Using OUT_RELOC64_INSIDE_4G() when setting up >>>>> the scratch BOs in 3DSTATE_VS and 3DSTATE_PS fixes the problem. >>>> >>>> >>>> >>>> Ok, so we do still need the 4G limit bit (i.e. I don't have to revert >>>> the >>>> kernel side for lack of open-source userspace), just not at the place we >>>> originally thought. I guess lesson learned is to actually test stuff >>>> first >>>> before I pull in the kernel side. Michel, did you not see the spilling >>>> corruptions when testing the mesa patches? Kristian, does this also blow >>>> up with just a piglit run? >>>> >>> >>> There's must be something else we have different in our systems. >>> >>> Kristian doesn't get the hang, and if I back out the 4G restriction in >>> the >>> STATE_BASE_ADDRESS, I don't see any corruption while running >>> INTEL_DEBUG=spill_fs glxgears. >> >> >> I don't want to block this, in fact, I very much want to see this move >> forward. However, I can't ack patches that 1) fix a gpu hang I can't >> reproduce and I don't agree should be a problem (the >> STATE_BASE_ADDRESS probllem) and 2) don't fix the issue that I see and >> reproduced on my BDW (the spilling problem). >> >> I'll try a full piglit run tomorrow with both my mesa patch and >> Michels and see what happens. However, this isn't the kind of problem >> that occurs once in a blue moon or for some corner case piglit test. >> If Michels understanding of the hw and the workaround is correct, >> nothing should work without his patch. >> >> To move forward on this, unless something new comes up when running >> piglit, I can take ownership of making mesa work one way or the other. >> The kernel and libdrm work is fine and most of Michels mesa patch is >> good. If mesa breaks down the road because we need to limit the >> STATE_BASE_ADDRESS relocs, I'll own the problem then. >> >> Kristian >> > > Thanks Kristian, > > Let me know if you need something from i915 or libdrm side. I've reviewed and pushed these two patches. I'll send out the mesa patch for review in a bit and once that lands we can do a libdrm release. Kristian _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel