Re: [PATCH 04/17] ASoC: Intel: avs: Inter process communication

Pierre-Louis Bossart <pierre-louis.bossart@xxxxxxxxxxxxxxx> · Fri, 25 Feb 2022 14:37:52 -0600

>>> +static inline void avs_ipc_err(struct avs_dev *adev, struct
>>> avs_ipc_msg *tx,
>>> +                   const char *name, int error)
>>> +{
>>> +    /*
>>> +     * If IPC channel is blocked e.g.: due to ongoing recovery,
>>> +     * -EPERM error code is expected and thus it's not an actual error.
>>> +     */
>>> +    if (error == -EPERM)
>>> +        dev_dbg(adev->dev, "%s 0x%08x 0x%08x failed: %d\n", name,
>>> +            tx->glb.primary, tx->glb.ext.val, error);
>>> +    else
>>> +        dev_err(adev->dev, "%s 0x%08x 0x%08x failed: %d\n", name,
>>> +            tx->glb.primary, tx->glb.ext.val, error);
>>> +}
>>
>> we've used such functions before and the feedback, e.g. from GregKH and
>> Mark Brown, has consistenly been that this is pushing the use of dev_dbg
>> too far.
> 
> 
> In basically all cases the outcome is going to be dev_err(). dev_dbg()
> is here to help keep DSP-recovery scenario viewer-friendly when checking
> dmesg. Ideally, there should be no DSP-recoveries to begin with : )

I will refer you to this thread:

https://lore.kernel.org/alsa-devel/YGX5AUQi41z52xk8@xxxxxxxxx/

> 
>>> +#define AVS_IPC_TIMEOUT_MS    300
>>
>> skl-sst-ipc.c:#define IPC_TIMEOUT_MSECS         3000
>>
>> that's one order of magniture lower. please add a comment or align.
>>
>>> +static void avs_dsp_receive_rx(struct avs_dev *adev, u64 header)
>>> +{
>>> +    struct avs_ipc *ipc = adev->ipc;
>>> +    union avs_reply_msg msg = AVS_MSG(header);
>>> +
>>> +    ipc->rx.header = header;
>>> +    if (!msg.status)
>>> +        memcpy_fromio(ipc->rx.data, avs_uplink_addr(adev),
>>> +                  ipc->rx.size);
>>
>> it wouldn't hurt to describe that the status determines whether
>> additional information can be read from a mailbox.
> 
> 
> Isn't that consisted with the behaviour of typical API function? Do not
> copy memory and return it to the caller if something went wrong along
> the way?

oh, I thought this was a case where the header contains all the
information, and only in specific cases you need to read stuff from the
mailbox.

You definitively need to add a comment on whether this is an error
handling or a feature.

>>> +void avs_dsp_process_response(struct avs_dev *adev, u64 header)
>>> +{
>>> +    struct avs_ipc *ipc = adev->ipc;
>>> +
>>> +    if (avs_msg_is_reply(header)) {
>>
>> the naming is confusing, it's difficult for me to understand that a
>> 'response' could not be a 'reply'. The two terms are synonyms, aren't
>> they?
> 
> 
> Those two are not the same from the firmware's point of view and thus
> they are not the same here. Response is either a reply or a
> notification. Replies are solicited, a request has been sent beforehand.
> Notifications are unsolicited, you are not sure when exactly and if at
> all they arrive.

add a comment then.

> Just so I'm not called a heretic later: yes, from English dictionary
> point of view these two words are synonyms. In general, wording found in
> this drivers matches firmware equivalents wherever possible to allow
> developers to switch between these two worlds with minimal adaptation
> period possible.

> 
>>> +    /* DSP acked host's request */
>>> +    if (hipc_ack & spec->hipc_ack_done_mask) {
>>> +        /* mask done interrupt */
>>> +        snd_hdac_adsp_updatel(adev, spec->hipc_ctl_offset,
>>> +                      AVS_ADSP_HIPCCTL_DONE, 0);
>>> +
>>> +        complete(&ipc->done_completion);
>>> +
>>> +        /* tell DSP it has our attention */
>>> +        snd_hdac_adsp_updatel(adev, spec->hipc_ack_offset,
>>> +                      spec->hipc_ack_done_mask,
>>> +                      spec->hipc_ack_done_mask);
>>> +        /* unmask done interrupt */
>>> +        snd_hdac_adsp_updatel(adev, spec->hipc_ctl_offset,
>>> +                      AVS_ADSP_HIPCCTL_DONE,
>>> +                      AVS_ADSP_HIPCCTL_DONE);
>>
>> does the order between the complete() and the next two register updates
>> matter?
>>
>> I would have updated the registers immediately and signal the completion
>> later.
>>
>> I am also not sure why it's necessary to mask the done interrupt then
>> unmask it. There is nothing that seems to require this masking?
>>
>> Or are you expecting the code blocked on wait_for_completion to be
>> handled with interrupts masked, which could be quite racy?
> 
> 
> Given how the things turned out in cAVS, some steps are not always
> required. Here, we have very strict implementation and so interrupt are
> masked.
> 
> I'm unsure if relocating complete() to the bottom would bring any
> consequences. And no, the code waiting_for_completion is not expecting
> interrupts to be masked as there is no reply for ROM messages.

it would be just fine to add that the masking is added as an extra
precaution, the order does not matter and the code executed after the
complete() does not assume any masking.

> 
>>> +        ret = IRQ_HANDLED;
>>> +    }
>>> +
>>> +    /* DSP sent new response to process */
>>> +    if (hipc_rsp & spec->hipc_rsp_busy_mask) {
>>> +        /* mask busy interrupt */
>>> +        snd_hdac_adsp_updatel(adev, spec->hipc_ctl_offset,
>>> +                      AVS_ADSP_HIPCCTL_BUSY, 0);
>>> +
>>> +        ret = IRQ_WAKE_THREAD;
>>> +    }
>>> +
>>> +    return ret;
>>> +}
>>
>>> +static int avs_ipc_wait_busy_completion(struct avs_ipc *ipc, int
>>> timeout)
>>> +{
>>> +    int ret;
>>> +
>>> +again:
>>> +    ret = wait_for_completion_timeout(&ipc->busy_completion,
>>> +                      msecs_to_jiffies(timeout));
>>> +    /*
>>> +     * DSP could be unresponsive at this point e.g. manifested by
>>> +     * EXCEPTION_CAUGHT notification. If so, no point in continuing.
>>
>> EXCEPTION_CAUGHT doesn't seem to be described in this patchset, so not
>> sure what this comment might mean.
> 
> 
> Comment describes the circumstances for the communication failures and
> arrival of EXCEPTION_CAUGHT notification is one of them.

that detail is unnecessary for reviewers.

> 
>>> +     */
>>> +    if (!ipc->ready)
>>> +        return -EPERM;
>>> +
>>> +    if (!ret) {
>>> +        if (!avs_ipc_is_busy(ipc))
>>> +            return -ETIMEDOUT;
>>> +        /*
>>> +         * Firmware did its job, either notification or reply
>>> +         * has been received - now wait until it's processed.
>>> +         */
>>> +        wait_for_completion_killable(&ipc->busy_completion);
>>
>> can you elaborate on why wait_for_completion() is not enough? I haven't
>> seen the 'killable' attribute been used by anyone in sound/
> 
> 
> This is connected to how firmware handles messaging i.e. via queue. you
> may get BUSY interrupt caused by a notification while waiting for the
> reply for your request. Being 'disturbed' by a notification does not
> mean firmware is dead, it's just busy and so we wait until previous
> response is processed entirely.

this does not clarify why 'killable' is necessary?

> 
>>> +    }
>>> +
>>> +    /* Ongoing notification's bottom-half may cause early wakeup */
>>> +    spin_lock(&ipc->rx_lock);
>>> +    if (!ipc->rx_completed) {
>>> +        /* Reply delayed due to notification. */
>>> +        reinit_completion(&ipc->busy_completion);
>>> +        spin_unlock(&ipc->rx_lock);
>>> +        goto again;
>>
>> shouldn't there be some counter to avoid potential infinite loops here?
> 
> 
> This is not a bad idea although the counter is going to be high e.g.:
> 128. With DEBUG-level logs enabled you can get ton of messages before
> your reply gets finally sent.

dev_dbg() in interrupts is usually not very helpful. we're trying to
move to traces instead.

> 
>>> +    }
>>> +
>>> +    spin_unlock(&ipc->rx_lock);
>>> +    return 0;
>>> +}
>>
>>> +static int avs_dsp_do_send_msg(struct avs_dev *adev, struct
>>> avs_ipc_msg *request,
>>> +                   struct avs_ipc_msg *reply, int timeout)
>>> +{
>>> +    struct avs_ipc *ipc = adev->ipc;
>>> +    int ret;
>>> +
>>> +    if (!ipc->ready)
>>> +        return -EPERM;
>>> +
>>> +    mutex_lock(&ipc->msg_mutex);
>>> +
>>> +    spin_lock(&ipc->rx_lock);
>>> +    avs_ipc_msg_init(ipc, reply);
>>> +    avs_dsp_send_tx(adev, request);
>>> +    spin_unlock(&ipc->rx_lock);
>>> +
>>> +    ret = avs_ipc_wait_busy_completion(ipc, timeout);
>>> +    if (ret) {
>>> +        if (ret == -ETIMEDOUT) {
>>> +            dev_crit(adev->dev, "communication severed: %d,
>>> rebooting dsp..\n",
>>> +                 ret);
>>
>> dev_crit() seems over the top if there is a recovery mechanism
> 
> 
> There is just one dev_crit() within entire driver and it's there for a
> reason - communication failure is critical and in practice, should never
> occur in any scenario on the production hardware.

git grep dev_crit shows mostly this being used for temperature and
shorts in codec drivers. that seems more 'critical' than a communication
failure that likely does not result in spontaneous combustion.