On 11/19/2022 2:32 AM, lihuisong (C) wrote: > > 在 2022/11/18 2:09, Robbie King 写道: >> On 11/7/2022 1:24 AM, lihuisong (C) wrote: >>> >>> 在 2022/11/4 23:39, Robbie King 写道: >>>> On 11/4/2022 11:15 AM, Sudeep Holla wrote: >>>>> On Fri, Nov 04, 2022 at 11:04:22AM -0400, Robbie King wrote: >>>>>> Hello Huisong, your raising of the shared interrupt issue is very timely, I >>>>>> am working to implement "Extended PCC subspaces (types 3 and 4)" using PCC >>>>>> on the ARM RDN2 reference platform as a proof of concept, and encountered >>>>>> this issue as well. FWIW, I am currently testing using Sudeep's patch with >>>>>> the "chan_in_use" flag removed, and so far have not encountered any issues. >>>>>> >>>>> >>>>> Interesting, do you mean the patch I post in this thread but without the >>>>> whole chan_in_use flag ? >>>> >>>> That's right, diff I'm running with is attached to end of message. >>> Hello Robbie, In multiple subspaces scenario, there is a problem >>> that OS doesn't know which channel should respond to the interrupt >>> if no this chan_in_use flag. If you have not not encountered any >>> issues in this case, it may be related to your register settings. >>> >> >> Hi Huisong, apologies, I see your point now concerning multiple subspaces. >> >> I have started stress testing where I continuously generate both requests >> and notifications as quickly as possible, and unfortunately found an issue >> even with the original chan_in_use patch. I first had to modify the patch >> to get the type 4 channel notifications to function at all, essentially >> ignoring the chan_in_use flag for that channel. With that change, I still >> hit my original stress issue, where the pcc_mbox_irq function did not >> correctly ignore an interrupt for the type 3 channel. >> >> The issue occurs when a request from AP to SCP over the type 3 channel is >> outstanding, and simultaneously the SCP initiates a notification over the >> type 4 channel. Since the two channels share an interrupt, both handlers >> are invoked. >> >> I've tried to draw out the state of the channel status "free" bits along >> with the AP and SCP function calls involved. >> >> type 3 >> ------ >> >> (1)pcc.c:pcc_send_data() >> | (5) mailbox.c:mbox_chan_receive_data() >> _______v (4)pcc.c:pcc_mbox_irq() >> free \_________________________________________ >> >> ^ >> type 4 ^ >> ------ ^ >> _____________________ >> free \_____________________________ >> ^ ^ >> | | >> (2)mod_smt.c:smt_transmit() | >> | >> (3)mod_mhu2.c:raise_interrupt() >> >> The sequence of events are: >> >> 1) OS initiates request to SCP by clearing FREE in status and ringing SCP doorbell >> 2) SCP initiates notification by filling shared memory and clearing FREE in status >> 3) SCP notifies OS by ringing OS doorbell >> 4) OS first invokes interrupt handler for type 3 channel >> >> At this step, the issue is that "val" from reading status (i.e. CommandCompleteCheck) >> is zero (SCP has not responded yet) so the code below falls through and continues >> to processes the interrupt as if the request has been acknowledged by the SCP. >> >> if (val) { /* Ensure GAS exists and value is non-zero */ >> val &= pchan->cmd_complete.status_mask; >> if (!val) >> return IRQ_NONE; >> } >> >> The chan_in_use flag does not address this because the channel is indeed in use. >> >> 5) ACPI:PCC client kernel module is incorrectly notified that response data is >> available > Indeed, chan_in_use flag is invalid for type4. Thinking about this some more, I believe there is a need for the chan_in_use flag for the type 4 channels. If there are multiple SCP to AP channels sharing an interrupt, and the PCC client code chooses to acknowledge the transfer from process level (i.e. call mbox_send outside of the mbox_chan_received_data callback), then I believe a window exists where the callback could be invoked twice for the same SCP to AP channel. I've attached a diff. >> I added the following fix (applied on top of Sudeep's original patch for clarity) >> for the issue above which solved the stress test issue. I've changed the interrupt >> handler to explicitly verify that the status value matches the mask for type 3 >> interrupts before acknowledging them. Conversely, a type 4 channel verifies that >> the status value does *not* match the mask, since in this case we are functioning >> as the recipient, not the initiator. >> >> One concern is that since this fundamentally changes handling of the channel status, >> that existing platforms could be impacted. > [snip] >> >> + /* >> + * When we control data flow on the channel, we expect >> + * to see the mask bit(s) set by the remote to indicate >> + * the presence of a valid response. When we do not >> + * control the flow (i.e. type 4) the opposite is true. >> + */ >> + if (pchan->is_controller) >> + cmp = pchan->cmd_complete.status_mask; >> + else >> + cmp = 0; >> + >> + val &= pchan->cmd_complete.status_mask; >> + if (cmp != val) >> + return IRQ_NONE; >> > We don't have to use the pchan->cmd_complete.status_mask as above. > > For the communication from AP to SCP, if this channel is in use, command > complete bit is 1 indicates that the command being executed has been > completed. > For the communication from SCP to AP, if this channel is in use, command > complete bit is 0 indicates that the bit has been cleared and OS should > response the interrupt. > > So you concern should be gone if we do as following approach: > " > val &= pchan->cmd_complete.status_mask; > need_rsp_irq = pchan->is_controller ? val != 0 : val == 0; > if (!need_rsp_irq) > return IRQ_NONE > " > > This may depend on the default value of the command complete register > for each channel(must be 1, means that the channel is free for use). > It is ok for type3 because of chan_in_use flag, while something needs > to do in platform or OS for type4. >> ret = pcc_chan_reg_read(&pchan->error, &val); >> if (ret) >> @@ -704,6 +717,9 @@ static int pcc_mbox_probe(struct platform_device *pdev) >> pcc_mbox_channels[i].con_priv = pchan; >> pchan->chan.mchan = &pcc_mbox_channels[i]; >> >> + pchan->is_controller = >> + (pcct_entry->type != ACPI_PCCT_TYPE_EXT_PCC_SLAVE_SUBSPACE); >> + > This definition does not apply to all types because type1 and type2 > support bidirectional communication. > >> if (pcct_entry->type == ACPI_PCCT_TYPE_EXT_PCC_SLAVE_SUBSPACE && >> !pcc_mbox_ctrl->txdone_irq) { >> pr_err("Plaform Interrupt flag must be set to 1"); >> > > I put all points we discussed into the following modifcation. > Robbie, can you try it again for type 4 and see what else needs to be > done? > > Regards, > Huisong > Thanks Huisong, I ran my current stress test scenario against your diff with no issues (I did have to manually merge due to a tabs to spaces issue which may be totally on my end, still investigating). Here is the proposed change to support chan_in_use for type 4 (which I've also successfully tested with). I think I have solved the tabs to spaces issue for my sent messages, apologies if that's not the case. diff --git a/drivers/mailbox/pcc.c b/drivers/mailbox/pcc.c index 057e00ee83c8..d4fcc913a9a8 100644 --- a/drivers/mailbox/pcc.c +++ b/drivers/mailbox/pcc.c @@ -292,7 +292,7 @@ static irqreturn_t pcc_mbox_irq(int irq, void *p) int ret; pchan = chan->con_priv; - if (pchan->mesg_dir == PCC_ONLY_AP_TO_SCP && !pchan->chan_in_use) + if (!pchan->chan_in_use) return IRQ_NONE; ret = pcc_chan_reg_read(&pchan->cmd_complete, &val); @@ -320,8 +320,16 @@ static irqreturn_t pcc_mbox_irq(int irq, void *p) goto out; } + /* + * Clearing in_use before RX callback allows calls to mbox_send + * (which sets in_use) within the callback so SCP to AP channels + * can acknowledge transfer within IRQ context + */ + if (pchan->cmd_complete.gas) + pchan->chan_in_use = false; + mbox_chan_received_data(chan, NULL); - rc = IRQ_HANDLED; + return IRQ_HANDLED; out: if (pchan->cmd_complete.gas) @@ -772,6 +780,8 @@ static int pcc_mbox_probe(struct platform_device *pdev) goto err; } pcc_set_chan_mesg_dir(pchan, pcct_entry->type); + if (pchan->mesg_dir == PCC_ONLY_SCP_TO_AP) + pchan->chan_in_use = true; if (pcc_mbox_ctrl->txdone_irq) { rc = pcc_parse_subspace_irq(pchan, pcct_entry);