Re: [Intel-gfx] [PATCH] drm/i915: Do NOT skip the first 4k of stolen memory for pre-allocated buffers

Daniel Vetter <daniel@xxxxxxxx> · Thu, 5 Apr 2018 15:38:35 +0200

On Thu, Apr 5, 2018 at 3:31 PM, Hans de Goede <hdegoede@xxxxxxxxxx> wrote:
> Hi,
>
>
> On 05-04-18 15:26, Daniel Vetter wrote:
>>
>> On Thu, Apr 5, 2018 at 1:47 PM, Hans de Goede <hdegoede@xxxxxxxxxx> wrote:
>>>
>>> Hi,
>>>
>>>
>>> On 05-04-18 09:14, Daniel Vetter wrote:
>>>>
>>>>
>>>> On Fri, Mar 30, 2018 at 02:27:15PM +0200, Hans de Goede wrote:
>>>>>
>>>>>
>>>>> Before this commit the WaSkipStolenMemoryFirstPage workaround code was
>>>>> skipping the first 4k by passing 4096 as start of the address range
>>>>> passed
>>>>> to drm_mm_init(). This means that calling drm_mm_reserve_node() to try
>>>>> and
>>>>> reserve the firmware framebuffer so that we can inherit it would always
>>>>> fail, as the firmware framebuffer starts at address 0.
>>>>>
>>>>> Commit d43537610470 ("drm/i915: skip the first 4k of stolen memory on
>>>>> everything >= gen8") says in its commit message: "This is confirmed to
>>>>> fix
>>>>> Skylake screen flickering issues (probably caused by the fact that we
>>>>> initialized a ring in the first page of stolen, but I didn't 100%
>>>>> confirm
>>>>> this theory)."
>>>>>
>>>>> Which suggests that it is safe to use the first page for a linear
>>>>> framebuffer as the firmware is doing.
>>>>>
>>>>> This commit always passes 0 as start to drm_mm_init() and works around
>>>>> WaSkipStolenMemoryFirstPage in i915_gem_stolen_insert_node_in_range()
>>>>> by insuring the start address passed by to
>>>>> drm_mm_insert_node_in_range()
>>>>> is always 4k or more. All entry points to i915_gem_stolen.c go through
>>>>> i915_gem_stolen_insert_node_in_range(), so that any newly allocated
>>>>> objects such as ring-buffers will not be allocated in the first 4k.
>>>>>
>>>>> The one exception is i915_gem_object_create_stolen_for_preallocated()
>>>>> which directly calls drm_mm_reserve_node() which now will be able to
>>>>> use the first 4k.
>>>>>
>>>>> This fixes the i915 driver no longer being able to inherit the firmware
>>>>> framebuffer on gen8+, which fixes the video output changing from the
>>>>> vendor logo to a black screen as soon as the i915 driver is loaded
>>>>> (on systems without fbcon).
>>>>>
>>>>> Signed-off-by: Hans de Goede <hdegoede@xxxxxxxxxx>
>>>>
>>>>
>>>>
>>>> I think this is worth a shot. The only explanation I can think of why
>>>> the
>>>> GOP could get away with this and still follow the w/a is if it doesn't
>>>> have a 1:1 mapping between GGTT and stolen.
>>>
>>>
>>>
>>> My guess is that the GOP does not care about the w/a because the w/a
>>> states that the command-streamers sometimes do a spurious flush (write)
>>> to the first page, and the command-streamers are not used until much
>>> later, so the GOP is probably ignoring the w/a all together.
>>>
>>> At least that is my theory.
>>
>>
>> Atm we do not know whether the GOP actually implements the wa or not.
>> That it doesn't care is just a conjecture, but we have no proof. On
>> previous platforms the 1:1 mapping did hold, but there's no
>> fundamental reason why it sitll does.
>>
>>>> Atm we hardcode that
>>>> assumption in intel_alloc_initial_plane_obj by passing the base_aligned
>>>> as
>>>> both the stolen_offset and the gtt_offset (but it's only the gtt_offset
>>>> really). And since we're not re-writing the ptes it's not noticeable.
>>>>
>>>> I think to decide whether this is the right approach we should
>>>> double-check whether that 1:1 assumption really holds true: Either read
>>>> back the ggtt ptes and check their addresses (but iirc on some platforms
>>>> their write-only, readback doesn't work), or we also rewrite the ptes
>>>> again for preallocated stuff, like when binding a normal object into the
>>>> gtt. If either of these approaches confirms that those affected gen8+
>>>> machines still use the 1:1 mapping, then I'm happy to put my r-b on this
>>>> patch. If not, well then we at least know what to fix: We need to read
>>>> out
>>>> the real stolen_offset, instead of making assumptions.
>>>
>>>
>>>
>>> I'm happy to give this a try on one or more affected machines, I can e.g.
>>> try this on both a skylake desktop and a cherry-trail based tablet.
>>>
>>> But I'm clueless about the whole PTE stuff, so I'm going to need someone
>>> to provide me with a patch following either approach. If readback of the
>>> PTEs works on skylake / cherrytrail I guess that would be the best way.
>>>
>>> Note to test this I'm currently booting the kernel with the machine's
>>> UEFI vendor logo still being displayed when efifb kicks in. I've added:
>>> "fbcon=vc:2-62" to the kernel commandline to make fbcon not bind to
>>> VC0 / VC1, so that the framebuffer contents stays intact, combined with
>>> the patch we are discussing now, this makes the vendor logo stay on
>>> the screen all the way till GDM loads, which looks rather nice actually
>>> :)
>>>
>>> And on shutdown we fall back to the original framebuffer and the vendor
>>> logo is back again. I cannot see any corruption in the display at either
>>> boot or shutdown time.
>>
>>
>> It shouldn't matter whether efifb takes over or not, it's still the
>> GOP's framebuffer we're taking over. Different content for sure, logo
>> is gone, but we only care about which pages we're using.
>
>
> Right, I only mentioned this to explain that I'm not seeing any
> (visible) corruption.
>
>> Wrt the code:
>> - (Re)binding happens by calling i915_vma_bind, if you put a call to
>> that at the end of the stolen_preallocated functions you should get
>> that. Ofc needs the logo still there so you can see it jump/get
>> mangled.
>> - GGTT PTEs for gen8+ are written through e.g. gen8_ggttt_insert_page
>> or gen8_ggtt_insert_entries (which takes the vma). Since we only care
>> about the first entry a readq(dev_priv->ggtt->gsm) is all we really
>> need. If it's all 0 then it's not working (since at least
>> _PAGE_PRESENT should be set, plus the physical address of a page in
>> stolen memory).
>
>
> Ok, I will try to give this a shot, probably not before coming
> Monday though.

For the 2nd expirement we ofc also need the base physical address of
stolen. /proc/iomem should dump that already, but if it's not there
pls add a printk for that too. It's dev_priv->dsm.start, see
i915_pages_create_for_stolen.

>> I can do some patches for each of those, but a bit swamped. Also no
>> gen8 handy for testing I think to make sure it works, so would take a
>> few weeks probably.
>
>
> I'll give this a shot myself first and attach the patch when I reply
> with the testing results, so that you can verify that the patch works
> as expected.

Thanks a lot. I'll be travelling a bit later on next week, but I'll
try to take a look.

Cheers, Daniel
>
> Regards,
>
> Hans
>
>
>
>>
>> Cheers, Daniel
>>>
>>>
>>>
>>>>
>>>> Cheers, Daniel
>>>>>
>>>>>
>>>>> ---
>>>>>    drivers/gpu/drm/i915/i915_gem_stolen.c | 15 ++++++---------
>>>>>    1 file changed, 6 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c
>>>>> b/drivers/gpu/drm/i915/i915_gem_stolen.c
>>>>> index af915d041281..ad949cc30928 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
>>>>> @@ -51,6 +51,10 @@ int i915_gem_stolen_insert_node_in_range(struct
>>>>> drm_i915_private *dev_priv,
>>>>>          if (!drm_mm_initialized(&dev_priv->mm.stolen))
>>>>>                  return -ENODEV;
>>>>>    +     /* WaSkipStolenMemoryFirstPage:bdw+ */
>>>>> +       if (INTEL_GEN(dev_priv) >= 8 && start < 4096)
>>>>> +               start = 4096;
>>>>> +
>>>>>          mutex_lock(&dev_priv->mm.stolen_lock);
>>>>>          ret = drm_mm_insert_node_in_range(&dev_priv->mm.stolen, node,
>>>>>                                            size, alignment, 0,
>>>>> @@ -343,7 +347,6 @@ int i915_gem_init_stolen(struct drm_i915_private
>>>>> *dev_priv)
>>>>>    {
>>>>>          resource_size_t reserved_base, stolen_top;
>>>>>          resource_size_t reserved_total, reserved_size;
>>>>> -       resource_size_t stolen_usable_start;
>>>>>          mutex_init(&dev_priv->mm.stolen_lock);
>>>>>    @@ -435,17 +438,11 @@ int i915_gem_init_stolen(struct
>>>>> drm_i915_private
>>>>> *dev_priv)
>>>>>                           (u64)resource_size(&dev_priv->dsm) >> 10,
>>>>>                           ((u64)resource_size(&dev_priv->dsm) -
>>>>> reserved_total) >> 10);
>>>>>    -     stolen_usable_start = 0;
>>>>> -       /* WaSkipStolenMemoryFirstPage:bdw+ */
>>>>> -       if (INTEL_GEN(dev_priv) >= 8)
>>>>> -               stolen_usable_start = 4096;
>>>>> -
>>>>>          dev_priv->stolen_usable_size =
>>>>> -               resource_size(&dev_priv->dsm) - reserved_total -
>>>>> stolen_usable_start;
>>>>> +               resource_size(&dev_priv->dsm) - reserved_total;
>>>>>          /* Basic memrange allocator for stolen space. */
>>>>> -       drm_mm_init(&dev_priv->mm.stolen, stolen_usable_start,
>>>>> -                   dev_priv->stolen_usable_size);
>>>>> +       drm_mm_init(&dev_priv->mm.stolen, 0,
>>>>> dev_priv->stolen_usable_size);
>>>>>          return 0;
>>>>>    }
>>>>> --
>>>>> 2.17.0.rc2
>>>>>
>>>>> _______________________________________________
>>>>> Intel-gfx mailing list
>>>>> Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
>>>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel