Re: [PATCH V2 2/4] dt-bindings: arm: fsl: add mu binding doc

Oleksij Rempel <o.rempel@xxxxxxxxxxxxxx> · Fri, 22 Jun 2018 08:48:52 +0200

On Fri, Jun 22, 2018 at 05:59:30AM +0000, A.s. Dong wrote:
> Hi Oleksij,
> 
> > -----Original Message-----
> > From: Oleksij Rempel [mailto:o.rempel@xxxxxxxxxxxxxx]
> > Sent: Friday, June 22, 2018 12:59 PM
> > To: A.s. Dong <aisheng.dong@xxxxxxx>
> > Cc: Sascha Hauer <s.hauer@xxxxxxxxxxxxxx>; Rob Herring
> > <robh@xxxxxxxxxx>; Mark Rutland <mark.rutland@xxxxxxx>;
> > devicetree@xxxxxxxxxxxxxxx; dongas86@xxxxxxxxx; dl-linux-imx <linux-
> > imx@xxxxxxx>; kernel@xxxxxxxxxxxxxx; Fabio Estevam
> > <fabio.estevam@xxxxxxx>; shawnguo@xxxxxxxxxx; linux-arm-
> > kernel@xxxxxxxxxxxxxxxxxxx
> > Subject: Re: [PATCH V2 2/4] dt-bindings: arm: fsl: add mu binding doc
> > 
> > On Fri, Jun 22, 2018 at 03:31:11AM +0000, A.s. Dong wrote:
> > > > -----Original Message-----
> > > > From: Oleksij Rempel [mailto:o.rempel@xxxxxxxxxxxxxx]
> > > > Sent: Friday, June 22, 2018 2:09 AM
> > > > To: A.s. Dong <aisheng.dong@xxxxxxx>
> > > > Cc: Sascha Hauer <s.hauer@xxxxxxxxxxxxxx>; Rob Herring
> > > > <robh@xxxxxxxxxx>; Mark Rutland <mark.rutland@xxxxxxx>;
> > > > devicetree@xxxxxxxxxxxxxxx; dongas86@xxxxxxxxx; dl-linux-imx <linux-
> > > > imx@xxxxxxx>; kernel@xxxxxxxxxxxxxx; Fabio Estevam
> > > > <fabio.estevam@xxxxxxx>; shawnguo@xxxxxxxxxx; linux-arm-
> > > > kernel@xxxxxxxxxxxxxxxxxxx
> > > > Subject: Re: [PATCH V2 2/4] dt-bindings: arm: fsl: add mu binding
> > > > doc
> > > >
> > > > Hi,
> > > >
> > > > On Thu, Jun 21, 2018 at 05:11:33PM +0000, A.s. Dong wrote:
> > > > > Hi Sascha,
> > > > > > On Wed, Jun 20, 2018 at 01:43:10PM -0600, Rob Herring wrote:
> > > > > > > On Sun, Jun 17, 2018 at 08:49:47PM +0800, Dong Aisheng wrote:
> > > > > > > > The Messaging Unit module enables two processors within the
> > > > > > > > SoC to communicate and coordinate by passing messages (e.g.
> > > > > > > > data, status and control) through the MU interface.
> > > > > > > >
> > > > > > > > ---
> > > > > > > > v1->v2:
> > > > > > > >  * typo fixes
> > > > > > > >  * remove status property
> > > > > > > >  * remove imx6&7 compatible string which may be added later for
> > > > > > > >    the generic mailbox binding
> > > > > > > >
> > > > > > > > Note: Because MU used by SCU is not implemented as a mailbox
> > > > > > > > driver, Instead, they're provided in library calls to gain
> > > > > > > > higher
> > > > performance.
> > > > > > >
> > > > > > > Using a binding doesn't mean you have to use an OS's subsystem.
> > > > > > >
> > > > > > > What needs higher performance? What's the performance
> > difference?
> > > > > > Why
> > > > > > > can't the mailbox framework be improved?
> > > > > >
> > > > > > From what I see the performance is improved by polling the
> > > > > > interrupt registers rather than using interrupts.
> > > > > > I see no reason though why this can't be implemented with the
> > > > > > mailbox framework as is.
> > > > > >
> > > > >
> > > > > I thought you've agreed to not write generic MU(mailbox) driver for
> > SCU.
> > > > > https://www.spinics.net/lists/arm-kernel/msg650217.html
> > > > > But seems it's still not quite certain...
> > > > >
> > > > > I'd like to explain some more.
> > > > >
> > > > > 1) the interrupt mechanism is not quite suitable for SCU firmware
> > > > > protocol as the transfer size would be more than 4 words and the
> > > > > response data size is also undetermined (it's set by SCU firmware
> > > > > side
> > > > during a response).
> > > > > So polling mode may be the best way to handle this as MU message
> > > > > handling usually is quite fast in a few microseconds.
> > > > >
> > > > > 2) It's true that Mailbox framework is well designed and powerful.
> > > > > But it's not quite suitable for SCU MU as we don't need to use the
> > > > > most bits of it. Mailbox seems like to be more suitable for a
> > > > > generic MU mailbox driver used by various clients/servers.  But
> > > > > SCU MU are quite specific to SCU protocol and can't be used by
> > > > > other clients (MU
> > > > > 0~4 is fixed for SCU communication in MX8 HW design).
> > > > > Even we write a MU Mailbox driver for SCU MU, it's still not a
> > > > > general one and can't be used by others (e.g. communication with M4).
> > > > > So I'd believe the current library way is still the best approach
> > > > > for SCU MU Using. But I'm also okay for another generic MU drivers
> > > > > for other common communications between A core and M4 side.
> > > > >
> > > > > 3) We actually have tried the MU(Mailbox) internally, it increased
> > > > > a lot complexity comparing to the current library way and got a
> > > > > booting time regression issue due to extra delays introduced in
> > > > > handling SCU protocol in mailbox way.
> > > > >
> > > > > And finally a nature question to us is:
> > > > > What the benefit we can get for SCU MU using a mailbox way?
> > > > >
> > > > > If we can't find benefits but introduce more complexities then why
> > > > > we would do that way?
> > > >
> > > > Looks like my response to same topic within my patch set is lost, so
> > > > I repost it
> > > > here:
> > > >
> > > > ok.. let's take some of IMX8 SCU driver code to see if there any
> > difference:
> > > >
> > > > ..this part of the code is blocking write procedure for one
> > > > channeler (register or what ever name you prefer) per write.. correct?
> > > >
> > > > +void mu_send_msg(struct mu_priv *priv, uint32_t index, uint32_t msg)
> > {
> > > > +	uint32_t mask = MU_SR_TE0_MASK >> index;
> > > > +
> > > > +	/* Wait TX register to be empty. */
> > > > +	while (!(readl_relaxed(priv->base + MU_ASR) & mask))
> > > > +		;
> > > > +	writel_relaxed(msg, priv->base + MU_ATR0  + (index * 4)); }
> > > > +EXPORT_SYMBOL_GPL(mu_send_msg);
> > > >
> > > > According to documentation it is recommended to use only one status
> > > > bit for the last register to use MU as one big 4words sized pipe.
> > > > But, there is no way you can write to all 4 registers without
> > > > checking status for each of this register, because your protocol has
> > > > dynamic message size. So you are forced to use your one channel as 4
> > separate channels.
> > > > Write it part of the message separately and allow your firmware read
> > > > 1 word to understand how to behave on the rest of the message.
> > > >
> > > > +static void sc_ipc_write(struct sc_ipc *sc_ipc, void *data) {
> > > > +	sc_rpc_msg_t *msg = (sc_rpc_msg_t *) data;
> > > > +	uint8_t count = 0;
> > > > +
> > > > +	/* Check size */
> > > > +	if (msg->size > SC_RPC_MAX_MSG)
> > > > +		return;
> > > > +
> > > > +	/* Write first word */
> > > > +	mu_send_msg(sc_ipc->mu_base, 0, *((uint32_t *) msg));
> > > > +	count++;
> > > >
> > > > .. in this loop you are writing to one channel/register per loop. If
> > > > the communicate will stall for some reason, the linux system will
> > > > just freeze here without any timeout or error message.. no idea how
> > about the opposite site.
> > > >
> > > > +	/* Write remaining words */
> > > > +	while (count < msg->size) {
> > > > +		mu_send_msg(sc_ipc->mu_base, count % MU_TR_COUNT,
> > > > +			    msg->DATA.u32[count - 1]);
> > > > +		count++;
> > > > +	}
> > > > +}
> > > >
> > > >
> > > > ... and here is a proof that sc_ipc_write will do in some cases 5
> > > > rounds
> > > > (5 * 4 bytes = 20 bytes single message) with probable busy waiting
> > > > for each channel
> > > >
> > > > +sc_err_t sc_misc_seco_image_load(sc_ipc_t ipc, uint32_t addr_src,
> > > > +				 uint32_t addr_dst, uint32_t len, bool fw) {
> > > > +	sc_rpc_msg_t msg;
> > > > +	uint8_t result;
> > > > +
> > > > +	RPC_VER(&msg) = SC_RPC_VERSION;
> > > > +	RPC_SVC(&msg) = (uint8_t)SC_RPC_SVC_MISC;
> > > > +	RPC_FUNC(&msg) = (uint8_t)MISC_FUNC_SECO_IMAGE_LOAD;
> > > > +	RPC_U32(&msg, 0) = addr_src;
> > > > +	RPC_U32(&msg, 4) = addr_dst;
> > > > +	RPC_U32(&msg, 8) = len;
> > > > +	RPC_U8(&msg, 12) = (uint8_t)fw;
> > > > +	RPC_SIZE(&msg) = 5;
> > > > +
> > > > +	sc_call_rpc(ipc, &msg, false);
> > > > +
> > > > +	result = RPC_R8(&msg);
> > > > +	return (sc_err_t)result;
> > > > +}
> > > > +
> > > >
> > > > So, the same code with mailbox framework will be some thing like this:
> > > > 1. request all 4 channels in the probe. ignore completion callback
> > > > and set proper timeout.
> > > >
> > >
> > > That looks a bit strange. As I said in another email, MU physical does
> > > not have multi Channels (here you mean are virtual channels).  And one
> > > whole MU instance is Exclusively used by SCU, why we need abstract
> > > them into 4 channels to use separately with extra unnecessary resource
> > hold and code path executed.
> > 
> > OK, so this MU should configured to use all 4 registers as one channel.
> > I have nothing against it to do in generic driver.
> > 
> > > > 2. keep your old code by replacing this part.
> > > > 	/* Write remaining words */
> > > > 	while (count < msg->size) {
> > > > 		mbox_send_message(sc_ipc->mbox_chan[count %
> > MU_TR_COUNT],
> > > > msg->DATA.u32[count - 1]);
> > > > 		count++;
> > > > 	}
> > > >
> > > > The advantage of this variant. If SCU firmware will stall, the linux
> > > > will be able to notify about it without blocking complete system.
> > > >
> > >
> > > This part of code has been used for a long time and we've never met
> > > the stall Issue which means SCU firmware guarantee it well. But I
> > > agree a timeout mechanism Is better. However,  if only for this
> > > reason, we can simply add a timeout mechanism in MU library function
> > > as well, but still is far from a strong enough reason to switch to a more
> > complexed mailbox one.
> > 
> > At this point a have absolutely zero tolerance. At the end I was one on poor
> > devs hunting similar kind of stalls.
> > Believe me, what can theoretically with 1% probability happen in the lab, will
> > end with days/months of bug hunting work and thousands of field returns.
> > 
> 
> Yeah, so I think a timeout mechanism is worthy to add.
> 
> > > > Can you please provide (if possible) your old mailbox based
> > implementation.
> > > > I'm curious to see why it is slow.
> > >
> > > In our implementation:
> > > 1) One channel per MU
> > > 2) Tx using the same way to send msg as sc_ipc_write() in polling mode
> > > 3) Rx enables the first word interrupt and polling for the rest of
> > > them in a hrtimer.
> > >
> > > The possible extra cost comparing to simple polling way:
> > > 1) Extra unnecessary code execution path of mailbox which is not used
> > > by SCU MU
> > > 2) Interrupt latency
> > > 3) Extra memory copy handling RX message separately.
> > > 4) Extra schedule of hrtimer polling
> > >
> > > Some of them probably could be optimized. However, besides the slow
> > > problem, the extra unnecessary complexity and can't be generic
> > > (specific to SCU) are also important ones.
> > >
> > > And the MU general purpose interrupt feature and general purpose flags
> > > feature may also not supported by mailbox well.
> > 
> > Ok, I give up.
> > Can we at least try to provide one device tree binding documentation for
> > that?
> 
> I will update the binding doc to use mailbox way as suggested by Rob.
> But I still need to change mbox-cells to allow 0 which actually already exist
> In kernel. So we may have too options for MU binding for users.
> For SCU MU, mbox-cells = <0>
> For General MU, mbox-cells = <1>

ok. cool.

> > At the end, it is one and the same HW IP block spread in one SoC (or
> > different SoCs) an be used in different way. How exactly it will be used is the
> > choice of the developer/customer/user...
> > even if it will be described withing mailbox section dos not mean you are
> > forced to use mailbox framework.
> > 
> > Lets imagine worst case scenario and all of MU on i.MX8 are used by
> > different drivers. In my mailbox binding I suggest to use <vendor>,<soc>-
> > mu-<mu side>. On i.MX7 there is only one MU, so it looks like "fsl,imx7s-mu-
> > a" and "fsl,imx7s-mu-b". i.MX8 has multiple MU so it should be enumerated.
> > Probably it will look like: "fsl,imx8x-mu0-a".
> > Do it make sense for you? Can we find and agreement here? :)
> > 
> 
> IMHO one possible concern may be that results in creating two much compatibles
> strings. For example, each SoC have two (fsl, <soc>-mu-<a or b>).
> Now with MU supported SoCs, we have mx6sx, mx7ulp, Mx7d, mx8qm,
> mx8x, mx8mq and etc...
> 
> My question is that if it's really worthy to create a separate side b compatible
> string instread of distinguish it by a property?
> 
> And for general purpose MUs to communicate with M4. Both side can be used by
> either A core and M core. If we're doing this way, once users wants A core to access
> B side, users need to change the compatible string, that seems more like an unusual
> way compared to changing properties.
> 
> How do you think?

My current use case is Linux with DT running on all CPUs of one SoC. On i.MX7
there is only one MU and I use same mailbox driver on both sides. In
this case it is enough to have same compatible with extra parameter to
see the difference between A and B side. 
Just imagine, you'll use DT in your firmware on i.MX8. Will you use same
driver on both sides of MU? If yes, I'm OK to exclude A and B from
compatible.

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
Attachment:
signature.asc

Description: PGP signature