Re: [PATCH 04/17] ASoC: Intel: avs: Inter process communication

Cezary Rojewski <cezary.rojewski@xxxxxxxxx> · Fri, 25 Feb 2022 19:06:04 +0100




On 2022-02-25 1:56 AM, Pierre-Louis Bossart wrote:
The boot process involving ROM-code requires specific handling with
'unstall' operations which are not required post-boot with normal IPC so
separate set of send-message handlers is added for each of the usecases.

consider splitting this long sentence and use simpler logic. It's quite
unclear how you went from boot to use cases.


Agree, ack.

+#include "messages.h"

avs_messages.h?


Same as previously, this header is for internal use only and is found 
within directory  already named 'avs'. "avs.h" header covers a wider 
range of types and that's why its name is generic. All others are 
specific and thus are not prefixed with "avs_".

  
  struct avs_dev;
  
@@ -18,6 +19,9 @@ struct avs_dsp_ops {
  	int (* const power)(struct avs_dev *, u32, bool);
  	int (* const reset)(struct avs_dev *, u32, bool);
  	int (* const stall)(struct avs_dev *, u32, bool);
+	irqreturn_t (* const irq_handler)(int, void *);
+	irqreturn_t (* const irq_thread)(int, void *);
+	void (* const int_control)(struct avs_dev *, bool);

kernel-doc or comments on what the last op might mean?


Sure, will add a comment.

  };
  
  #define avs_dsp_op(adev, op, ...) \
@@ -34,6 +38,18 @@ struct avs_spec {
  
  	const u32 core_init_mask;	/* used during DSP boot */
  	const u64 attributes;		/* bitmask of AVS_PLATATTR_* */
+	const u32 sram_base_offset;
+	const u32 sram_window_size;
+
+	const u32 rom_status;
+	const u32 hipc_req_offset;
+	const u32 hipc_req_ext_offset;
+	const u32 hipc_req_busy_mask;
+	const u32 hipc_ack_offset;
+	const u32 hipc_ack_done_mask;
+	const u32 hipc_rsp_offset;
+	const u32 hipc_rsp_busy_mask;
+	const u32 hipc_ctl_offset;

is this really desirable to describe the IPC registers, when we know
there were 3 generations of Intel IPC registers. this is ipc-1.5 only.


Indeed, this abstraction could be removed, ack.

  };
  
  struct avs_dev {
@@ -42,6 +58,9 @@ struct avs_dev {
  
  	void __iomem *adsp_ba;
  	const struct avs_spec *spec;
+	struct avs_ipc *ipc;
+
+	struct completion fw_ready;
  };
  
  /* from hda_bus to avs_dev */
@@ -61,4 +80,78 @@ int avs_dsp_core_stall(struct avs_dev *adev, u32 core_mask, bool stall);
  int avs_dsp_core_enable(struct avs_dev *adev, u32 core_mask);
  int avs_dsp_core_disable(struct avs_dev *adev, u32 core_mask);
  
+/* Inter Process Communication */
+
+struct avs_ipc_msg {
+	union {
+		u64 header;
+		union avs_global_msg glb;
+		union avs_reply_msg rsp;
+	};
+	void *data;
+	size_t size;
+};
+
+struct avs_ipc {
+	struct device *dev;
+
+	struct avs_ipc_msg rx;
+	u32 default_timeout_ms;
+	bool ready;

ready for what? This should be described or documented.


In the past this field was called "fw_ready" until we have decided to 
split struct avs_ipc from struct avs_dev. In my opinion "ipc->ready" 
looks very intuitive in the code, given that it translates to: inter 
process communication ready!

No problem with adding a comment though.

+
+	bool rx_completed;
+	spinlock_t rx_lock;
+	struct mutex msg_mutex;

checkpatch would tell you to add a comment for spinlock and mutex. it's
quite unclear what they might describe and if they are related.


I'll add a kernel-doc for just like for the ->ready field.

+	struct completion done_completion;
+	struct completion busy_completion;
+};
+
+#define AVS_EIPC	EREMOTEIO
+/*
+ * IPC handlers may return positive value (firmware error code) what denotes
+ * successful HOST <-> DSP communication yet failure to process specific request.
+ *
+ * Below macro converts returned value to linux kernel error code.
+ * All IPC callers MUST use it as soon as firmware error code is consumed.
+ */
+#define AVS_IPC_RET(ret) \
+	(((ret) <= 0) ? (ret) : -AVS_EIPC)

why not use -EREMOTEIO directly? -AVS_EIPC is not very useful for the
reader.

And why -EREMOTEIO? I see that you used it in catpt but that's a very
surprising code that no one else uses in sound/


Well, the question: "Which kernel error should represent an error coming 
from remote process AKA audio firmware" needed an answer. EREMOTEIO fits 
the description and so it was chosen.

+
+static inline void avs_ipc_err(struct avs_dev *adev, struct avs_ipc_msg *tx,
+			       const char *name, int error)
+{
+	/*
+	 * If IPC channel is blocked e.g.: due to ongoing recovery,
+	 * -EPERM error code is expected and thus it's not an actual error.
+	 */
+	if (error == -EPERM)
+		dev_dbg(adev->dev, "%s 0x%08x 0x%08x failed: %d\n", name,
+			tx->glb.primary, tx->glb.ext.val, error);
+	else
+		dev_err(adev->dev, "%s 0x%08x 0x%08x failed: %d\n", name,
+			tx->glb.primary, tx->glb.ext.val, error);
+}

we've used such functions before and the feedback, e.g. from GregKH and
Mark Brown, has consistenly been that this is pushing the use of dev_dbg
too far.


In basically all cases the outcome is going to be dev_err(). dev_dbg() 
is here to help keep DSP-recovery scenario viewer-friendly when checking 
dmesg. Ideally, there should be no DSP-recoveries to begin with : )

+#define AVS_IPC_TIMEOUT_MS	300

skl-sst-ipc.c:#define IPC_TIMEOUT_MSECS         3000

that's one order of magniture lower. please add a comment or align.

+static void avs_dsp_receive_rx(struct avs_dev *adev, u64 header)
+{
+	struct avs_ipc *ipc = adev->ipc;
+	union avs_reply_msg msg = AVS_MSG(header);
+
+	ipc->rx.header = header;
+	if (!msg.status)
+		memcpy_fromio(ipc->rx.data, avs_uplink_addr(adev),
+			      ipc->rx.size);

it wouldn't hurt to describe that the status determines whether
additional information can be read from a mailbox.


Isn't that consisted with the behaviour of typical API function? Do not 
copy memory and return it to the caller if something went wrong along 
the way?

+}
+
+static void avs_dsp_process_notification(struct avs_dev *adev, u64 header)
+{
+	struct avs_notify_mod_data mod_data;
+	union avs_notify_msg msg = AVS_MSG(header);
+	size_t data_size = 0;
+	void *data = NULL;
+
+	if (!adev->ipc->ready && msg.notify_msg_type != AVS_NOTIFY_FW_READY) {
+		dev_dbg(adev->dev, "FW not ready, skip notification: 0x%08x\n",
+			msg.primary);

can this happen?

you should add a comment on what could be sent before the first 'real'
sign of life from the DSP.

it's also unclear why this dev_dbg() when 'unknown notifications' below
are handled as dev_warn()


I would like to say: "no, this situation cannot happen" very much, but 
that's simply not true. Any notification could be sent prior to FW_READY 
as the internal queue may not always get flushed between the firmware 
restoring.

Ack on the s/warn/info/ part.

+		return;
+	}
+
+	/* Calculate notification payload size. */
+	switch (msg.notify_msg_type) {
+	case AVS_NOTIFY_FW_READY:
+		break;
+
+	case AVS_NOTIFY_PHRASE_DETECTED:
+		data_size = sizeof(struct avs_notify_voice_data);
+		break;
+
+	case AVS_NOTIFY_RESOURCE_EVENT:
+		data_size = sizeof(struct avs_notify_res_data);
+		break;
+
+	case AVS_NOTIFY_MODULE_EVENT:
+		memcpy_fromio(&mod_data, avs_uplink_addr(adev), sizeof(mod_data));
+		data_size = sizeof(mod_data) + mod_data.data_size;

it wouldn't hurt to describe the layout behing this formula.


The layout is kind of implied by the structure itself but a comment 
wouldn't hurt, agree.

+		break;
+
+	default:
+		dev_warn(adev->dev, "unknown notification: 0x%08x\n",
+			 msg.primary);
+		break;
+	}
+
+	if (data_size) {
+		data = kmalloc(data_size, GFP_KERNEL);
+		if (!data)
+			return;
+
+		memcpy_fromio(data, avs_uplink_addr(adev), data_size);
+	}
+
+	/* Perform notification-specific operations. */
+	switch (msg.notify_msg_type) {
+	case AVS_NOTIFY_FW_READY:
+		dev_dbg(adev->dev, "FW READY 0x%08x\n", msg.primary);
+		adev->ipc->ready = true;

avs->ipc->fw_ready?


As I have explained earlier, this was the case until we have separated 
struct avs_ipc from struct avs_dev. I'll provide a kernel-doc instead.

+		complete(&adev->fw_ready);> +		break;
+
+	default:
+		break;
+	}
+
+	kfree(data);
+}
+
+void avs_dsp_process_response(struct avs_dev *adev, u64 header)
+{
+	struct avs_ipc *ipc = adev->ipc;
+
+	if (avs_msg_is_reply(header)) {

the naming is confusing, it's difficult for me to understand that a
'response' could not be a 'reply'. The two terms are synonyms, aren't they?


Those two are not the same from the firmware's point of view and thus 
they are not the same here. Response is either a reply or a 
notification. Replies are solicited, a request has been sent beforehand. 
Notifications are unsolicited, you are not sure when exactly and if at 
all they arrive.

Just so I'm not called a heretic later: yes, from English dictionary 
point of view these two words are synonyms. In general, wording found in 
this drivers matches firmware equivalents wherever possible to allow 
developers to switch between these two worlds with minimal adaptation 
period possible.

+	/* DSP acked host's request */
+	if (hipc_ack & spec->hipc_ack_done_mask) {
+		/* mask done interrupt */
+		snd_hdac_adsp_updatel(adev, spec->hipc_ctl_offset,
+				      AVS_ADSP_HIPCCTL_DONE, 0);
+
+		complete(&ipc->done_completion);
+
+		/* tell DSP it has our attention */
+		snd_hdac_adsp_updatel(adev, spec->hipc_ack_offset,
+				      spec->hipc_ack_done_mask,
+				      spec->hipc_ack_done_mask);
+		/* unmask done interrupt */
+		snd_hdac_adsp_updatel(adev, spec->hipc_ctl_offset,
+				      AVS_ADSP_HIPCCTL_DONE,
+				      AVS_ADSP_HIPCCTL_DONE);

does the order between the complete() and the next two register updates
matter?

I would have updated the registers immediately and signal the completion
later.

I am also not sure why it's necessary to mask the done interrupt then
unmask it. There is nothing that seems to require this masking?

Or are you expecting the code blocked on wait_for_completion to be
handled with interrupts masked, which could be quite racy?


Given how the things turned out in cAVS, some steps are not always 
required. Here, we have very strict implementation and so interrupt are 
masked.

I'm unsure if relocating complete() to the bottom would bring any 
consequences. And no, the code waiting_for_completion is not expecting 
interrupts to be masked as there is no reply for ROM messages.

+		ret = IRQ_HANDLED;
+	}
+
+	/* DSP sent new response to process */
+	if (hipc_rsp & spec->hipc_rsp_busy_mask) {
+		/* mask busy interrupt */
+		snd_hdac_adsp_updatel(adev, spec->hipc_ctl_offset,
+				      AVS_ADSP_HIPCCTL_BUSY, 0);
+
+		ret = IRQ_WAKE_THREAD;
+	}
+
+	return ret;
+}

+static int avs_ipc_wait_busy_completion(struct avs_ipc *ipc, int timeout)
+{
+	int ret;
+
+again:
+	ret = wait_for_completion_timeout(&ipc->busy_completion,
+					  msecs_to_jiffies(timeout));
+	/*
+	 * DSP could be unresponsive at this point e.g. manifested by
+	 * EXCEPTION_CAUGHT notification. If so, no point in continuing.

EXCEPTION_CAUGHT doesn't seem to be described in this patchset, so not
sure what this comment might mean.


Comment describes the circumstances for the communication failures and 
arrival of EXCEPTION_CAUGHT notification is one of them.

+	 */
+	if (!ipc->ready)
+		return -EPERM;
+
+	if (!ret) {
+		if (!avs_ipc_is_busy(ipc))
+			return -ETIMEDOUT;
+		/*
+		 * Firmware did its job, either notification or reply
+		 * has been received - now wait until it's processed.
+		 */
+		wait_for_completion_killable(&ipc->busy_completion);

can you elaborate on why wait_for_completion() is not enough? I haven't
seen the 'killable' attribute been used by anyone in sound/


This is connected to how firmware handles messaging i.e. via queue. you 
may get BUSY interrupt caused by a notification while waiting for the 
reply for your request. Being 'disturbed' by a notification does not 
mean firmware is dead, it's just busy and so we wait until previous 
response is processed entirely.

+	}
+
+	/* Ongoing notification's bottom-half may cause early wakeup */
+	spin_lock(&ipc->rx_lock);
+	if (!ipc->rx_completed) {
+		/* Reply delayed due to notification. */
+		reinit_completion(&ipc->busy_completion);
+		spin_unlock(&ipc->rx_lock);
+		goto again;

shouldn't there be some counter to avoid potential infinite loops here?


This is not a bad idea although the counter is going to be high e.g.: 
128. With DEBUG-level logs enabled you can get ton of messages before 
your reply gets finally sent.

+	}
+
+	spin_unlock(&ipc->rx_lock);
+	return 0;
+}

+static int avs_dsp_do_send_msg(struct avs_dev *adev, struct avs_ipc_msg *request,
+			       struct avs_ipc_msg *reply, int timeout)
+{
+	struct avs_ipc *ipc = adev->ipc;
+	int ret;
+
+	if (!ipc->ready)
+		return -EPERM;
+
+	mutex_lock(&ipc->msg_mutex);
+
+	spin_lock(&ipc->rx_lock);
+	avs_ipc_msg_init(ipc, reply);
+	avs_dsp_send_tx(adev, request);
+	spin_unlock(&ipc->rx_lock);
+
+	ret = avs_ipc_wait_busy_completion(ipc, timeout);
+	if (ret) {
+		if (ret == -ETIMEDOUT) {
+			dev_crit(adev->dev, "communication severed: %d, rebooting dsp..\n",
+				 ret);

dev_crit() seems over the top if there is a recovery mechanism


There is just one dev_crit() within entire driver and it's there for a 
reason - communication failure is critical and in practice, should never 
occur in any scenario on the production hardware.

+
+			avs_ipc_block(ipc);
+		}
+		goto exit;
+	}
+
+	ret = ipc->rx.rsp.status;
+	if (reply) {
+		reply->header = ipc->rx.header;
+		if (reply->data && ipc->rx.size)
+			memcpy(reply->data, ipc->rx.data, reply->size);
+	}
+
+exit:
+	mutex_unlock(&ipc->msg_mutex);
+	return ret;
+}
+
+static int avs_dsp_send_msg_sequence(struct avs_dev *adev,
+				     struct avs_ipc_msg *request,
+				     struct avs_ipc_msg *reply, int timeout,
+				     bool wake_d0i0, bool schedule_d0ix)

the last two arguments are not used. is this intentional?


Used by the d0ix implementation that is not part of this part. Can relocate.

+{
+	return avs_dsp_do_send_msg(adev, request, reply, timeout);
+}
+
+int avs_dsp_send_msg_timeout(struct avs_dev *adev, struct avs_ipc_msg *request,
+			     struct avs_ipc_msg *reply, int timeout)
+{
+	return avs_dsp_send_msg_sequence(adev, request, reply, timeout,
+					 false, false);
+}
+
+int avs_dsp_send_msg(struct avs_dev *adev, struct avs_ipc_msg *request,
+		     struct avs_ipc_msg *reply)
+{
+	return avs_dsp_send_msg_timeout(adev, request, reply,
+					adev->ipc->default_timeout_ms);
+}

is there really a 4-level nesting in your helpers?

avs_dsp_send_msg
   avs_dsp_send_msg_timeout
      avs_dsp_send_msg_sequence
            avs_dsp_do_send_msg

this seems complicated, no?

At the very least you should explain what a message and message sequence
are, and why this is split this way.


With d0ix handling added, it becomes clear why such separation exists. I 
left these parts here to reduce the delta in patches that update this 
code later on. Can simplify here and update the d0ix implementation 
accordingly.

+
+int avs_dsp_send_pm_msg_timeout(struct avs_dev *adev,
+				struct avs_ipc_msg *request,
+				struct avs_ipc_msg *reply, int timeout,
+				bool wake_d0i0)
+{
+	return avs_dsp_send_msg_sequence(adev, request, reply, timeout,
+					 wake_d0i0, false);
+}

so the 'pm' means 'wake-d0i0'? that's far from intuitive.

avs_dsp_send_d0i0_msg_timeout() would better describe what you are
trying to do.

In addition you need an explanation that d0i0 is a *firmware* concept
without direct links to the *device* Dx status.


This goes for both, Dx and D0ix related messages.

+void avs_dsp_interrupt_control(struct avs_dev *adev, bool enable)
+{
+	const struct avs_spec *const spec = adev->spec;
+	u32 value;
+
+	value = enable ? AVS_ADSP_ADSPIC_IPC : 0;
+	snd_hdac_adsp_updatel(adev, AVS_ADSP_REG_ADSPIC,
+			      AVS_ADSP_ADSPIC_IPC, value);
+
+	value = enable ? AVS_ADSP_HIPCCTL_DONE : 0;
+	snd_hdac_adsp_updatel(adev, spec->hipc_ctl_offset,
+			      AVS_ADSP_HIPCCTL_DONE, value);
+
+	value = enable ? AVS_ADSP_HIPCCTL_BUSY : 0;
+	snd_hdac_adsp_updatel(adev, spec->hipc_ctl_offset,
+			      AVS_ADSP_HIPCCTL_BUSY, value);

does the order matter? please add a comment.


interrupt_control() is only used during probing and teardown procedures 
so the order does not matter. You need all those bits up before firmware 
loading can even begin. And when you are done with firmware, you zero 
these out but and the order does not matter either as again, there's 
none to "talk" to.
Note: ADSPIC_IPC is higher in hierarchy than DONE and BUSY.