On Fri, Dec 6, 2024, at 12:59, Tudor Ambarus wrote: >>> +/** >>> + * acpm_get_rx() - get response from RX queue. >>> + * @achan: ACPM channel info. >>> + * @xfer: reference to the transfer to get response for. >>> + * >>> + * Return: 0 on success, -errno otherwise. >>> + */ >>> +static int acpm_get_rx(struct acpm_chan *achan, struct acpm_xfer *xfer) >>> +{ >>> + struct acpm_msg *tx = &xfer->tx; >>> + struct acpm_msg *rx = &xfer->rx; >>> + struct acpm_rx_data *rx_data; >>> + const void __iomem *base, *addr; >>> + u32 rx_front, rx_seqnum, tx_seqnum, seqnum; >>> + u32 i, val, mlen; >>> + bool rx_set = false; >>> + >>> + rx_front = readl_relaxed(achan->rx.front); >>> + i = readl_relaxed(achan->rx.rear); >> >> If you have to use readl_relaxed(), please annotate why, >> otherwise just use the normal readl(). Is this access to >> the SRAM? > > all IO accesses in this driver are to SRAM, yes. > > There are no DMA accesses involved in the driver and the _relaxed() > accessors are fully ordered for accesses to the same endpoint, so I > thought _relaxed are fine. My normal rule is to only use _relaxed() if it makes any intended difference in performance or behavior, otherwise every reader of the code has to understand why exactly this was done. More generally speaking, use the function with the shortest name to do what you want, since the longer functions are usually special cases that are not the default ;-) SRAM is a bit special, as you don't have to deal with side-effects the same way that an MMIO register would, though there is still a question of serializing against concurrent accesses from another processor. It's possible that readl_relaxed()/writel_relaxed() end up being exactly what you wand for this driver, but I would probably encapsulate them in a local sram_readl()/sram_writel() wrapper along with some documentation. You should also have a look at which ioremap() version you actually want. The default ioremap() uses PROT_DEVICE_nGnRE and has strong ordering but not the "non-posted" guarantee like ioremap_np() that would let you skip the read-after-write for serialization. If you care a lot about performance on the SRAM access, you can go the other way with a more relaxed ioremap_wc() and explicit barriers where needed, or even better see what the remote side uses and use the same mapping flags and serialization here. >>> + spin_lock_irqsave(&achan->tx_lock, flags); >>> + >>> + tx_front = readl_relaxed(achan->tx.front); >>> + idx = (tx_front + 1) % achan->qlen; >>> + >>> + ret = acpm_wait_for_queue_slots(achan, idx); >>> + if (ret) { >>> + spin_unlock_irqrestore(&achan->tx_lock, flags); >>> + return ret; >>> + } >> >> It looks like you are calling a busy loop function inside >> of a hardirq handler here, with a 500ms timeout. This is >> not ok. > > That's true, the code assumes that the method can be called from hard > irq context. > > Can't tell whether that timeout is accurate, it comes from downstream > and the resources that I have do not specify anything about what would > be an acceptable timeout. > > I see arm_scmi typically uses 30 ms for transport layers. I think 30ms is still too much for a busy loop. I see that in the SCMI driver it implements both a sleeping and a spinning loop, but I have not found exactly which callers use the spinning version, and what timeouts those have. >> If you may need to wait for a long timeout, I would suggest >> changing the interface so that this function is not allowed >> to be called from irq-disabled context, change the spinlock >> to a mutex and polling read to a sleeping version. > > I think the question I need to answer here is whether the acpm interface > can be called from atomic context or not. On a first look, all > downstream drivers call it from process context. Curios there's no clock > enable though in downstream, which would require atomic context. I'll > get back to you on this. > > If there's at least a client that calls the interface in hard/soft irq > context (clocks?) then I don't think I'll be able to implement your > suggestion. Each channel has its own TX/RX rings in SRAM. If there are > requests from both hard irq and process context for the same channel, > then I need to protect the accesses to the rings via spin_lock_irqsave. > This is what the code assumes, because downstream allows calls from > atomic context even if I can't pinpoint one right now. > > I guess I can switch everything to sleeping version, and worry about > atomic context when such a client gets proposed? Yes, usually at that point the caller can be changed to use a threaded IRQ or similar. Arnd