Re: [RFC] drm/i915: Add a new modparam for customized ring multiplier

"Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@xxxxxxxxx> · Wed, 27 Dec 2017 17:43:00 +0000

>> I definitely asked what will be if GT request will be bigger than IA request. But that was a year ago and I don't remember the answer. Let me ask again. I will mail back in few days.

Hi Chris, here is the response.

Question was: "Whether we can meet with the RING transition penalty (at least theoretically) if we will have GT request higher than IA request with the dominant IA load and tiny GT load, i.e. reverted situation of what we have actually faced? For example, if we will try to pin IA frequency to 800MHz (x1 multiplier) and GT frequency to 700MHz (x2 multiplier): in that case we will have requests for ring 800 vs. 1400."

Answer is: "In this case, if the GT will toggle between RC0 and RC6, it will force ring frequency to toggle between 800 and 1400, which in the toggling time will stall IA execution. This will lead to performance loss." However, this is a case if we have really few toggle events within few milliseconds. It is quite probable that GT driver will not allow such behavior to happen if it simply doesn't often toggle between RC0 and RC6. Considering that GT driver probably handles much less interrupts than IA, this can be the case. So, I think Chris, that's now question to you: how often you toggle between RC0 and RC6 to see the reverted issue to happen? If you don't toggle much, then RING will simply remain on 1400 almost all the time and you will see no issue.

Again, I remind that's the talk about Gen9 only.

Dmitry.

-----Original Message-----
From: Intel-gfx [mailto:intel-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx] On Behalf Of Rogozhkin, Dmitry V
Sent: Tuesday, December 26, 2017 9:39 AM
To: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>; Li, Yaodong <yaodong.li@xxxxxxxxx>; intel-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Widawsky, Benjamin <benjamin.widawsky@xxxxxxxxx>
Subject: Re:  [RFC] drm/i915: Add a new modparam for customized ring multiplier

>> To clarify, the HW will flip between the two GT/IA requests rather than stick to the highest?

Yes, it will flip on Gen9. On Gen8 there was some mechanism (HW) which flattened that. But it was removed/substituted in Gen9. In Gen10 it was tuned  to close the mentioned issue.

>> Do you know anything about the opposite position. I heard a suggestion that simply increasing the ringfreq universally caused thermal throttling in some other workloads. Do you have any knowledge of those?

Initially we tried to just increase GT multiplier to x3 and stepped into the throttling. Thus, we introduced parameter to be able to mitigate all that depending on the SKU and user needs. I definitely asked what will be if GT request will be bigger than IA request. But that was a year ago and I don't remember the answer. Let me ask again. I will mail back in few days.

>> You are thinking of plugging into intel_pstate to make it smarter for ia freq transitions?

Yep. This seems a correct step to give some automatic support instead of parameter/hardcoded multiplier.

Dmitry.

-----Original Message-----
From: Chris Wilson [mailto:chris@xxxxxxxxxxxxxxxxxx] 
Sent: Tuesday, December 26, 2017 8:59 AM
To: Rogozhkin, Dmitry V <dmitry.v.rogozhkin@xxxxxxxxx>; Li, Yaodong <yaodong.li@xxxxxxxxx>; intel-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Gong, Zhipeng <zhipeng.gong@xxxxxxxxx>; Widawsky, Benjamin <benjamin.widawsky@xxxxxxxxx>; Mateo Lozano, Oscar <oscar.mateo@xxxxxxxxx>; Kamble, Sagar A <sagar.a.kamble@xxxxxxxxx>; Li, Yaodong <yaodong.li@xxxxxxxxx>
Subject: RE: [RFC] drm/i915: Add a new modparam for customized ring multiplier

Quoting Rogozhkin, Dmitry V (2017-12-26 16:39:23)
> Clarification on the issue. Consider that you have a massive load on GT and just tiny one on IA. If GT will program the RING frequency to be lower than IA frequency, then you will fall into the situation when RING frequency constantly transits from GT to IA level and back. Each transition of a RING frequency is a full system stall. If you will have "good" transition rate with few transitions per few milliseconds you will lose ~10% of performance. That's the case for media workloads when you easily can step into this since 1) media utilizes few GPU engines and with few parallel workloads you can make sure that at least 1 engine is _always_ doing something, 2) media BB are relatively small, so you have regular wakeups of the IA to manage requests. This will affect Gen9 platforms due to HW design change (we've spot this in SKL). This will not happen in Gen8 (old HW design). This will be fixed in Gen10+ (CNL+).

To clarify, the HW will flip between the two GT/IA requests rather than stick to the highest? Iirc, the expectation was that we were setting a requested minimum frequency for the ring/ia based off the gpu freq.

> On SKL we ran into this with the GPU frequency pinned to 700MHz, CPU to 2GHz. Multipliers were x2 for GT, x1 for IA.

Basically, with the GPU clocked to mid frequency, memory throughput is insufficient to keep the fixed functions occupied, and you need to increase the ring frequency. Is there ever a case where we don't need max ring frequency? (Perhaps we still need to set low frequency for GT
idle?) I guess media is more susceptible to this as that workload should be sustainable at reduced clocks, GL et al are much more likely to keep the clocks ramped all the way up.

Do you know anything about the opposite position. I heard a suggestion that simply increasing the ringfreq universally caused thermal throttling in some other workloads. Do you have any knowledge of those?

> So, effectively, what we need to do is to make sure that RING frequency request from GT is _not_ below the request from IA. If IA requests 2GHz, we can't request 1.4GHz, we need request at least 2GHz. Multiplier patch was intended to do exactly that, but manually. Can  we somehow automate that managing IA frequency requests to the RING?

You are thinking of plugging into intel_pstate to make it smarter for ia freq transitions? That seems possible, certainly. I'm not sure if the ring frequency is actually poked from anywhere else in the kernel, would be interesting to find out.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx