On 12/12/2024 20:55, Konrad Dybcio wrote:
On 11.12.2024 9:29 AM, Neil Armstrong wrote:
The Adreno GPU Management Unit (GMU) can also scale DDR Bandwidth along
the Frequency and Power Domain level, but by default we leave the
OPP core scale the interconnect ddr path.
While scaling via the interconnect path was sufficient, newer GPUs
like the A750 requires specific vote paremeters and bandwidth to
achieve full functionality.
In order to calculate vote values used by the GPU Management
Unit (GMU), we need to parse all the possible OPP Bandwidths and
create a vote value to be sent to the appropriate Bus Control
Modules (BCMs) declared in the GPU info struct.
This vote value is called IB, while on the other side the GMU also
takes another vote called AB which is a 16bit quantized value
of the floor bandwidth against the maximum supported bandwidth.
The AB vote will be calculated later when setting the frequency.
The vote array will then be used to dynamically generate the GMU
bw_table sent during the GMU power-up.
Reviewed-by: Akhil P Oommen <quic_akhilpo@xxxxxxxxxxx>
Signed-off-by: Neil Armstrong <neil.armstrong@xxxxxxxxxx>
---
drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 144 ++++++++++++++++++++++++++++++++++
drivers/gpu/drm/msm/adreno/a6xx_gmu.h | 13 +++
drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 1 +
3 files changed, 158 insertions(+)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index 14db7376c712d19446b38152e480bd5a1e0a5198..36696d372a42a27b26a018b19e73bc6d8a4a5235 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -9,6 +9,7 @@
#include <linux/pm_domain.h>
#include <linux/pm_opp.h>
#include <soc/qcom/cmd-db.h>
+#include <soc/qcom/tcs.h>
#include <drm/drm_gem.h>
#include "a6xx_gpu.h"
@@ -1287,6 +1288,101 @@ static int a6xx_gmu_memory_probe(struct a6xx_gmu *gmu)
return 0;
}
+/**
+ * struct bcm_db - Auxiliary data pertaining to each Bus Clock Manager (BCM)
+ * @unit: divisor used to convert bytes/sec bw value to an RPMh msg
+ * @width: multiplier used to convert bytes/sec bw value to an RPMh msg
+ * @vcd: virtual clock domain that this bcm belongs to
+ * @reserved: reserved field
+ */
+struct bcm_db {
+ __le32 unit;
+ __le16 width;
+ u8 vcd;
+ u8 reserved;
+};
No. This is a direct copypasta of drivers/interconnect/qcom/icc-rpmh.h
You cannot just randomly duplicate things..
Move it out to a shared header in include/ (and remove the duplicate from
clk-rpmh.c while at it)
Not sure if this a good idea
I'd also really prefer if you took
drivers/interconnect/qcom/bcm-voter.c : tcs_list_gen()
and abstracted it to operate on struct bcm_db with any additional
required parameters passed as arguments.. Still left some comments
on this version if you decide to go with it
They are still very different, look closely, tcs_list_gen is designed to
operate on BW aggregations + scsaling, it would make no sense to unify them.
The calculation is simple enough, I made it explicitely easy to read and
maintain, but honestly there's nothing special.
+
+static int a6xx_gmu_rpmh_bw_votes_init(const struct a6xx_info *info,
+ struct a6xx_gmu *gmu)
+{
+ const struct bcm_db *bcm_data[GMU_MAX_BCMS] = { 0 };
+ unsigned int bcm_index, bw_index, bcm_count = 0;
+
+ if (!info->bcms)
+ return 0;
You already checked that from the caller
Good catch
+
+ /* Retrieve BCM data from cmd-db */
+ for (bcm_index = 0; bcm_index < GMU_MAX_BCMS; bcm_index++) {
+ size_t count;
+
+ /* Stop at first unconfigured bcm */
+ if (!info->bcms[bcm_index].name)
+ break;
Unconfigured doesn't really fit here.. Maybe just mention the list is NULL
-terminated
Ack
+
+ bcm_data[bcm_index] = cmd_db_read_aux_data(
+ info->bcms[bcm_index].name,
+ &count);
+ if (IS_ERR(bcm_data[bcm_index]))
+ return PTR_ERR(bcm_data[bcm_index]);
+
+ if (!count)
+ return -EINVAL;
If this condition ever happens, it'll be impossible to track down,
please add an err message
Hmm sure
+
+ ++bcm_count;
I've heard somewhere that prefixed increments are discouraged for
"reasons" and my OCD would like to support that
Never got this memo...
+ }
+
+ /* Generate BCM votes values for each bandwidth & BCM */
+ for (bw_index = 0; bw_index < gmu->nr_gpu_bws; bw_index++) {
+ u32 *data = gmu->gpu_ib_votes[bw_index];
+ u32 bw = gmu->gpu_bw_table[bw_index];
+
+ /* Calculations loosely copied from bcm_aggregate() & tcs_cmd_gen() */
+ for (bcm_index = 0; bcm_index < bcm_count; bcm_index++) {
+ bool commit = false;
+ u64 peak;
+ u32 vote;
+
+ /* Skip unconfigured BCM */
+ if (!bcm_data[bcm_index])
+ continue;
I don't see how this is useful here
It's a leftover, will drop
+
+ if (bcm_index == bcm_count - 1 ||
+ (bcm_data[bcm_index + 1] &&
+ bcm_data[bcm_index]->vcd != bcm_data[bcm_index + 1]->vcd))
+ commit = true;
+
+ if (!bw) {
+ data[bcm_index] = BCM_TCS_CMD(commit, false, 0, 0);
+ continue;
+ }
+
+ if (info->bcms[bcm_index].fixed) {
You may want to take a pointer to info->bcms[bcm_index]
Sure, will help
+ u32 perfmode = 0;
+
+ if (bw >= info->bcms[bcm_index].perfmode_bw)
+ perfmode = info->bcms[bcm_index].perfmode;
+
+ data[bcm_index] = BCM_TCS_CMD(commit, true, 0, perfmode);
+ continue;
+ }
+
+ /* Multiply the bandwidth by the width of the connection */
+ peak = (u64)bw * le16_to_cpu(bcm_data[bcm_index]->width);
+ do_div(peak, info->bcms[bcm_index].buswidth);
+
+ /* Input bandwidth value is in KBps, scale the value to BCM unit */
+ peak *= 1000ULL;
I don't think this needs to be ULL since the other argument is an u64
+ do_div(peak, le32_to_cpu(bcm_data[bcm_index]->unit));
+
+ vote = clamp(peak, 1, BCM_TCS_CMD_VOTE_MASK);
+
+ data[bcm_index] = BCM_TCS_CMD(commit, true, vote, vote);
x is the avg vote, y is the peak vote
downstream sets both calculated from the exact same value and the same way...
Just noting down for my future self I guess, a6xx sets ab=0,
a7xx sets ab=ib like you did here
Probably, I'll need to check on that, but it can be done in a second step when enabling it on a6xx
Konrad