Re: [PATCH v4 4/8] soc: mediatek: mtk-cmdq: Add pa_base parsing for unsupported subsys ID hardware

Jason-JH Lin (林睿祥) <Jason-JH.Lin@xxxxxxxxxxxx> · Wed, 5 Mar 2025 15:58:46 +0000

On Wed, 2025-03-05 at 12:03 +0100, AngeloGioacchino Del Regno wrote:
> 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
> 
> 
> Il 05/03/25 10:46, Jason-JH Lin (林睿祥) ha scritto:
> > On Tue, 2025-03-04 at 10:35 +0100, AngeloGioacchino Del Regno
> > wrote:
> > > 
> > > External email : Please do not click links or open attachments
> > > until
> > > you have verified the sender or the content.
> > > 
> > > 
> > > Il 18/02/25 06:41, Jason-JH Lin ha scritto:
> > > > When GCE executes instructions, the corresponding hardware
> > > > register
> > > > can be found through the subsys ID. For hardware that does not
> > > > support
> > > > subsys ID, its subsys ID will be set to invalid value and its
> > > > physical
> > > > address needs to be used to generate GCE instructions.
> > > > 
> > > > This commit adds a pa_base parsing flow to the cmdq_client_reg
> > > > structure
> > > > for these unsupported subsys ID hardware.
> > > > 
> > > 
> > > Does this work only for the MMINFRA located GCEs, or does this
> > > work
> > > also for
> > > the legacy ones in MT8173/83/88/92/95 // MT6795/6893/etc?
> > > 
> > > In order to actually review and decide, I do need to know :-)
> > > 
> > 
> > Yes, it's for the SoCs without subsys ID, it's not related to the
> > MMINFRA.
> > 
> > This can also work on MT8173/83/92/95 // MT6795/6893/etc.
> > You can remove the `mediatek,gce-client-reg` properties in their
> > dtsi
> > and cherry-pick this series to verify it. :-)
> > 
> 
> This is curious - and that brings more questions to the table (for
> curiosity
> more than anything else at this point).
> 
> Since this is a way to make use of the CMDQ for address ranges that
> are not tied
> to any subsys id (hence no gce-client-reg and just physical address
> parsing for
> generating instructions), do you know what are the performance
> implications of
> using this, instead of subsys IDs on SoCs that do support them?
> 

The main advantage of using subsys ID is to reduce the number of
instruction.
Without subsy ID, you will need one more `ASSIGN` instruction to assign
the high bytes of the physical address.

E,g. In mt8195-gce.h: #define SUBSYS_1c00XXXX 3

If you want GCE to write the value 0x0000000f to 0x1c00_002c.

With subsys ID, you can use only one instruction to achieve it:
1. WRITE value: 0x000000f to subsys: 0x3 + offset: 0x0002c
- OP code: WRTIE = 0x90
- subsys ID: 0x1c00XXXX = 0x03
- offset: 0x002c
- value: 0x0000000f

Without subsys ID, you will need 2 instructions to achieve it:
1. ASSIGN address high bytes: 0x1c00 to GCE temp register: SPR0
- OP code: LOGIC = 0xa0
- arg_type: register, value, value = (0x8)
- sub OP: ASSIGN = 0x0
- register index to store the assign value: SPR0 = 0x0
- value to assign: 0x1c00
2. WRITE value: 0x0000000f to temp register: SPR0 + offset:0x002c
- OP code: WRITE = 0x90
- sub OP(temp register index): SPR0 = 0x0
- offset for temp register: 0x002c
- value: 0x0000000f

> Being clear: if we were to migrate a SoC like MT8195 to using this
> globally
> instead of using subsys ids, would the performance be degraded?
> And if yes, do you know by how much?
> 

E,g. If the inst number with subsys ID is N.
1. If CMDQ is implement like this, then inst number will be (N * 2):
assign SPR0 = 0x1c00
write A to SPR0 + offset: 0x2c 
assign SPR0 = 0x1c00
write B to SPR0 + offset: 0x3c 
assign SPR0 = 0x1c00
write C to SPR0 + offset: 0x4c 
...

2. If CMDQ is implement like this, the inst number will be (N + 1 * n):
assign SPR0 = 0x1c00
write A to SPR0 + offset: 0x2c 
write B to SPR0 + offset: 0x3c
write C to SPR0 + offset: 0x4c
...

When the same cmd buffer changes the base address for n times:
assign SPR0 = 0x1c00
write A to SPR0 + offset: 0x2c
assign SPR0 = 0x1c01
write B to SPR0 + offset: 0x2c
assign SPR0 = 0x1c02
write C to SPR0 + offset: 0x2c
assign SPR0 = 0x1c00
write D to SPR0 + offset: 0x3c
...

So you can imagine the performance will increase, but maybe not too
much if we use it in the right way...
Except the old SoC that didn't support SPR and CPR. The reason will be
addressed in the next paragraph.

> What you're proposing almost looks like being too good to be true -
> and makes
> me wonder, at this point, why the subsys id was used in the first
> place :-)
> 

That's because of the old GCE version in the old SoC only support GPR,
it didn't support SPR and CPR.

GPR:
All 32 GCE threads share the same GPR0~GPR15, GPR will be affected by
other GCE threads if they use it at the same time.

SPR:
Each GCE thread has 4 SPR, SPR won't be affected by another GCE thread.

CPR:
All 32 GCE threads share the same CPR, there are over 1000 CPR can be
used. It need to be managed properly to avoid the resource conflicting.

Due to the GPR resource restriction in the old GCE version, the usage
of subsys ID can avoid GPR conflicting issues when multiple GCE threads
are using GPR to physical assign high bytes all the time.

I have simplified some complicate instruction rules, so the description
above may not be 100% matched to the CMDQ helper driver code.
But I think the main concept is correct.
Hope these explanation can help well :-)

Regards,
Jason-JH Lin

> Cheers!
> Angelo
> 
> > Regards,
> > Jason-JH Lin
> > 
> > > Thanks,
> > > Angelo
> > > 
> > > 
> > 
> 
> 
>