Re: [PATCH v2 00/11] Drivers for gunyah hypervisor

Elliot Berman <quic_eberman@xxxxxxxxxxx> · Tue, 9 Aug 2022 17:07:39 -0700

On 8/9/2022 6:13 AM, Robin Murphy wrote:
[drive-by observation since one thing caught my interest...] >
Appreciate all the comments.

Jassi,

I understood you have talked with some of our folks (Trilok and Carl) a
few years ago about using the mailbox APIs. We were steered away from
using mailboxes then. Is that still the recommendation today?

On 2022-08-09 00:38, Elliot Berman wrote:
I might be completely wrong about this, but if my in-mind picture of 
Gunyah is correct, I'd have implemented the gunyah core subsytem as 
mailbox provider, RM as a separate platform driver consuming these 
mailboxes and in turn being a remoteproc driver, and consoles as 
remoteproc subdevices. >
The mailbox framework can only fit with message queues and not 
doorbells or vCPUs.
Is that so? There was a whole long drawn-out saga around the SCMI 
protocol using the Arm MHU mailbox as a set of doorbells for 
shared-memory payloads, but it did eventually get merged as the separate 
arm_mhu_db.c driver, so unless we're talking about some completely 
different notion of "doorbell"... :/
Doorbells will be harder to fit into mailbox API framework.

 - Simple doorbells don't have any TX done acknowledgement model at
   the doorbell layer (see bullet 1 from 
https://lore.kernel.org/all/68e241fd-16f0-96b4-eab8-369628292e03@xxxxxxxxxxx/).
   Doorbell clients might have a doorbell acknowledgement flow, but the
   only client I have for doorbells doesn't. IRQFDs would send an
   empty message to the mailbox and immediately do a client-triggered
   TX_DONE.

 - Using mailboxes for the more advanced use-case doorbell forces client
   to use doorbells a certain way because each channel could be a bit on
   the bitmask, or the client could have complete control of the entire
   bitmask. I think implementing the mailbox API would force the
   otherwise-generic doorbell code to make that decision for clients.

Further, I wanted to highlight one other challenge with fitting Gunyah
message queues into mailbox API:

 - Message queues track a flag which indicates whether there is space
   available in the queue. The flag is returned on msgq_send. When the
   message queue is full, an interrupt is raised when there is more
   space available. This could be used as a TX_DONE indicator, but
   mailbox framework's API prevents us from doing mbox_chan_txdone
   inside the send_data channel op.

I think this might be solvable by adding a new txdone mechanism.

The mailbox framework also relies on the mailbox being defined in the 
devicetree. RM is an exceptional case in that it is described in the 
devicetree. Message queues for other VMs would be dynamically created 
at runtime as/when that VM is created. Thus, the client of the message 
queue would need to "own" both the controller and client ends of the 
mailbox.
FWIW, if the mailbox API does fit conceptually then it looks like it 
shouldn't be *too* hard to better abstract the DT details in the 
framework itself and allow providers to offer additional means to 
validate channel requests, which might be more productive than inventing 
a whole new thing. >
Some notes about fitting mailboxes into Gunyah IPC:

 - A single mailbox controller can't cover all the gunyah devices. The
   number of gunyah devices is not fixed and varies per VM launched.
   Mailbox controller would need to be per-VM or per-device, where each
   channel represents a capability.

 - The other device types (like vCPU) don't fit into message-based
   style framework. I'd like to have a consistent way of binding a
   device's function with the device. If we use mailbox API, some
   devices will use mailbox and others will use some other mechanism.
   I'd prefer to consistently use "some other mechanism" throughout.

 - TX and RX message queues are independent and "combining" a TX and RX
   message queue happens at client layer by the client requesting access
   to two otherwise unassociated message queues. A mailbox channel would
   either be associated with a TX message queue capability or an RX
   message queue capability. This isn't a major hurdle per se, but it
   decreases how cleanly we can use the mailbox APIs IMO.
     - A VM might only have a TX message queue and no RX message queue,
       or vice versa. We won't be able to require coupling a TX and RX
       message queue for the mailbox.

 - TX done acknowledgement doesn't fit Gunyah IPC (see above) and a new
   TX_DONE mode would need to be implemented.

 - Need to make it possible for a client to binding a mailbox channel
   without DT.

I'm getting a bit apprehensive about the tweaks needed to make mailbox
framework usable for Gunyah. Will there be enough code re-use and help
with abstracting the direct-to-Gunyah APIs? IMO, there isn't, but
opinions are welcome :)

Thanks,
Elliot