On 04/05/2017 10:21, Eero Tamminen wrote:
Hi,
On 04.05.2017 11:53, Tvrtko Ursulin wrote:
On 04/05/2017 09:35, Arkadiusz Hiler wrote:
On Thu, Apr 27, 2017 at 05:23:16PM +0100, Chris Wilson wrote:
But what is being counter suggested is that their is no reason for
these
mocs entries. If the sdk is just using mocs registers without first
programming them outside of the kernel abi, then it will be hitting
uncached memory - and then the only benefit is from simply enabling
cached access. The kernel ABI is minimalist for a reason, and we
want to
know why we should be adding tables that we need to maintain forever
(bonus points for making that a consistent interface for hardware for
years to come).
-Chris
Thanks for rephrasing - that's exactly what I am concerned with.
Did you just use the MediaSDK as it is - meaning that MOCS entries
beyond the set of the 3 we have defined had been naively utilized?
If that's the case it is probably the cause of the performance
difference - everything beyond "the 3" means UNCACHED.
Can you try changing MediaSDK to only use entries that are already in?
How the performance differs in that case?
Alternatively, at the time this was on my plate, Eero had suggested a
sequence of experiments by basically gradually replicating the default
UC/WB entries to currently empty slots, starting on GT2 parts and then
going forward adding the more fine tuned parts.
This would have showed the benefit of fine tuned entries vs basic cached
ones. Unfortunately I never got round doing this, but it sounded like a
really good approach to me.
I could paste these suggestion here if Eero wouldn't mind?
Of course I don't mind. :-)
Excellent, so here is what you wrote to me at that time:
------------------------------------------------------------------
You could start by putting first ED_UC line values to other ED_UC lines,
and the first ED_WB line values to other ED_WB lines.
Then test that against standard kernel and VPG kernel on SKL GT2
machine, to evaluate LLC settings.
If perf of that looks good, then test same settings also on SKL GT3e, or
GT4e to evaluate impact of the more fine-tuned eLLC settings in addition
to LLC ones.
If GT2 results don't look good, try using ED_WB line for all lines that
have either ED_WB or L3_WB.
If if that doesn't look good either, try using ED_UC line for all lines
that have either ED_UC or L3_UC.
And if even that fails to produce performance-wise good results, we can
conclude that we need VPG kernel's fine-tuned MOCS settings are really
needed.
Please provide some spreadsheet of the results you get.
(My guess is that that the first settings provide almost all of the
available speedup on GT2, but with eDRAM things aren't that
straightforward.)
------------------------------------------------------------------
But I am also
not sure if it is still relevant after the effort of exactly documenting
the extended set of entries started.
It's relevant in the sense that we don't currently don't know whether
there's any actual benefit from the new entries (i.e. was it just an
issue of VPG not using the correct existing entries).
If there is, that would be motivation to investigate impact of them also
on other workloads.
There probably is a benefit since it is hard to imagine fine tuned
entries would otherwise exist. But I agree it makes sense to get a
complete understanding of relative contribution of individual fine tunings.
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx