Re: [PATCH] drm/i915: Disable caches for Global GTT.

Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx> · Thu, 6 Nov 2014 13:55:32 -0800

On Thu, Nov 6, 2014 at 12:53 PM, Ville Syrjälä
<ville.syrjala@xxxxxxxxxxxxxxx> wrote:
> On Thu, Nov 06, 2014 at 09:09:52PM +0200, Ville Syrjälä wrote:
>> On Thu, Nov 06, 2014 at 10:55:12AM +0000, Chris Wilson wrote:
>> > On Wed, Nov 05, 2014 at 12:11:41PM +0100, Daniel Vetter wrote:
>> > > On Thu, Oct 30, 2014 at 5:18 PM, Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx> wrote:
>> > > > Global GTT doesn't have pat_sel[2:0] so it always point to pat_sel = 000;
>> > > > So the only way to avoid screen corruptions is setting PAT 0 to Uncached.
>> > > >
>> > > > MOCS can still be used though. But if userspace is trusting PTE for
>> > > > cache selection the safest thing to do is to let caches disabled.
>> > > >
>> > > > BSpec: "For GGTT, there is NO pat_sel[2:0] from the entry,
>> > > > so RTL will always use the value corresponding to pat_sel = 000"
>> > > >
>> > > > Reference: https://bugs.freedesktop.org/show_bug.cgi?id=85576
>> > > > Cc: James Ausmus <james.ausmus@xxxxxxxxx>
>> > > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx>
>> > >
>> > > Ok, I think this is -fixes + cc: stable material, but we need to be a
>> > > bit more elaborate on the commit message:
>> > >
>> > > - System agent ggtt writes (i.e. cpu gtt mmaps) already work before
>> > > this patch, i.e. the same uncached + snooping access like on gen6/7
>> > > seems to be in effect.
>> > > - So this just fixes blitter/render access. Again it looks like it's
>> > > not just uncached access, but uncached + snooping. So we can still
>> > > hold onto all our assumptions wrt cpu clflushing on LLC machines.
>> > >
>> > > I think this should be both in the commit message and code.
>> > >
>> > > Chris, please correct if I've summarized this wrongly.
>> >
>> > Me, I just get confused by the statement that writes through the System
>> > Agent to main memory are snooped.
>> >
>> > But the statement that GTT + WC is coherent with the display as it was
>> > on SNB+ is certainly true.
>>
>> I have a supicious mind so I simply had to go and verify this on my
>> IVB. Writes to a LLC cacheable scanout buffer through a GTT mmap did
>> not result in any cache dirt on the screen. So I must admit defeat.
>>
>> I also did another experiment where I wrote the data through a cpu
>> mmap, which obviously resulted in plenty of cache dirt. But I then
>> followed it with a read through the GTT mmap (obj still mapped as
>> cacheable in GTT), and the data matched what I wrote, and it also
>> cleared the cache dirt from the screen. So any CPU GTT access will
>> snoop the caches and also causes a writeback of the cacheline. I
>> suspect it also invalidates the cacheline, but I didn't verify that
>> in any way.
>>
>> So I guess I can rest easier now having witnessed it all with mine
>> own eyes :)
>
> I also repeated the same tests on CHV, and it behaves in a manner
> consistent with its non-LLC heritage ie. CPU GTT access never snoops
> no matter what the PTE says.
>
> I also frobbed with the PAT entries a bit, and that at least confirms
> that status page writes go through PAT[0] even though we set up the
> PTEs to use PAT[4]. When PTE[0] doesn't have the snoop bit, all we get
> is a GPU "hang", and with the snoop bit set things work fine.
>
> So I think that for CHV the ppgtt=0 case is perfectly fine with the
> current code because:
> 1) CPU GTT access is fine as it never snoops and we expect it not to,
>    in fact we hand a SIGBUS to anyone who tries this
> 2) GPU access to snooped bo will work since PAT[0] says to snoop
> 3) GPU access to non-snooped bo will work even if PAT[0] says to snoop,
>    but will perhaps be a bit slow due to the extra snoops. Should be
>    able to override with MOCS though and get the speed back.
> 4) Status page must be snooped, so things pretty much fall apart when
>    PAT[0] doesn't have the snoop bit. Well, unless we decide to add
>    explicit cache flushes to status page reads

Great job! Thanks for verifying that deeply.
I wasn't really sure about how we are setting the entire PAT table,
but with your tests I'm confident we are in the right direction.

>
> I'm not entirely sure if I should be coserned about the color_adjust
> stuff, especially in the case where MOCS is used. We might end up
> mixing snooped and non-snooped accesses in a way that the color
> adjustment wasn't able to anticipate.

uhm... interesting point.

> I guess I could just try to
> see if I can anger the hardware doing that. But that's a job or
> another day.

indeed! :)

>
> --
> Ville Syrjälä
> Intel OTC
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Rodrigo Vivi
Blog: http://blog.vivi.eng.br
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx