On Fri, Oct 20, 2023, at 05:49, Jyan Chou wrote: > We implemented cmdq feature on Synopsys DesignWare mmc driver. > The difference between dw_mmc.c and dw_mmc_cqe.c were distinct > register definitions, mmc user flow and the addition of cmdq. > > New version of User Guide had modify mmc driver's usage flow, > we may need to renew code to precisely follow user guide. > > More over, We added a wait status function to satisfy synopsys > user guide's description, since this flow might be specific in > synopsys host driver only. > > Signed-off-by: Jyan Chou <jyanchou@xxxxxxxxxxx> > > diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig > index 159a3e9490ae..894b89a396bb 100644 > --- a/drivers/mmc/host/Kconfig > +++ b/drivers/mmc/host/Kconfig > @@ -762,6 +762,19 @@ config MMC_DW_PLTFM > > If unsure, say Y. > > +config MMC_DW_CQE > + tristate "Synopsys DesignWare Memory Card with CQE Interface" > + depends on ARC || ARM || ARM64 || MIPS || COMPILE_TEST Why do you need these specific architectures? For a generic driver, it should not make a difference which architecture it is built for, and it seems silly to exclude e.g. RISCV here. Maybe just use 'depends on OF'. > +DECLARE_COMPLETION(dw_mci_wait); I'm worried by the use of all these blocking and busy-waiting primitives in the driver. In addition to the completion, I see a tasklet, a timer, various readl_poll_timeout, and more. In an ideal block device implementation, none of these should be needed, as the block requests should get queued up from the blk_mq logic and completed when an interrupt happens. In mmc, this is traditionally a bit more complex, but I would hope we can avoid most of the additional deferral in new drivers, and explain why the other ones are needed when this is not possible. Let's have a look at them individually. > +#if defined(CONFIG_DEBUG_FS) > +static int dw_mci_cqe_req_show(struct seq_file *s, void *v) > +{ > + struct dw_mci_slot *slot = s->private; > + struct mmc_request *mrq; > + struct mmc_command *cmd; > + struct mmc_command *stop; > + struct mmc_data *data; > + You should not need an #ifdef around the debugfs helpers, if you use the registration interface correctly, it all gets dropped by dead code elimination. > +static int dw_mci_cqe_start_command(struct dw_mci *host, > + struct mmc_command *cmd, u32 cmd_flags) > +{ .... > + > + host->int_waiting = &dw_mci_wait; > + end = jiffies + msecs_to_jiffies(DW_MCI_TIMEOUT_Ms); > + mod_timer(&host->timer, end); This is just a timeout that happens when the command does not complete within the expected time, right? If so that is probably fine. > + spin_lock_irqsave(&host->irq_lock, flags); > + mcq_writew(host, CMD_R, cmd_flags); > + spin_unlock_irqrestore(&host->irq_lock, flags); > + > + wait_for_completion(host->int_waiting); This is odd for a couple of reasons: - an MMIO write is not serialized by a spinlock because the bus fabric between the CPU and the device has no concept of locking, so if you actually rely on the lock to do something here, that is likely broken - Using spin_lock_irqsave() implies that this function can be called from both atomic and non-atomic context, but then you have a blocking function right after it, so you could have used the much cheaper spin_lock_irq() here, if the lock is in fact useful - The function is named ..._start_command(), which to me implies that it just pushes an operation to the device, but does not actually wait for it, so the wait_for_completion() seems misplaced here. > + if (xfer_flag == 1) { > + ret = readl_poll_timeout(host->regs + SDMMC_NORMAL_INT_STAT_R, > status, > + (status & SDMMC_XFER_COMPLETE) == > + SDMMC_XFER_COMPLETE, 10, DW_MCI_TIMEOUT); > + if (ret) { > + /*error interrupt detected*/ > + if ((mcq_readw(host, NORMAL_INT_STAT_R) & SDMMC_ERR_INTERRUPT) && > + host->tuning) > + dev_info(host->dev, "Tuning error ... keep tuning\n"); > + else > + dev_err(host->dev, "Timeout waiting xfer complete, status = > 0x%x\n", > + (status & 0xffff)); > + } > + } else { > + ret = readl_poll_timeout(host->regs + SDMMC_NORMAL_INT_STAT_R, > status, > + (status & SDMMC_CMD_COMPLETE) == > + SDMMC_CMD_COMPLETE, 10, DW_MCI_TIMEOUT); > + if (ret) > + dev_err(host->dev, "Timeout waiting cmd request complete\n"); > + } And these appear to defer the operation even more: regardless of the xfer_flag, you always end up in an expensive poll loop that starts a timer. If these are expected to not need the timer in normal operation, please add some comments to explain the logic, otherwise this may need to be restructured in a way that waits for an interrupt instead of a slow timer. > + > +static int dw_mci_cqe_wait_status(struct dw_mci *host, struct > mmc_command *cmd, u32 *status) > +{ > + struct mmc_command wait; > + u32 cmdr; > + u32 cur_state; > + unsigned long timeend; > + int err = 0; > + > + /* According to Synopsys userguide, we need to send wait command after > + * stop cmd to check current status > + */ > + > + wait = host->stat_ready; > + memset(&wait, 0, sizeof(struct mmc_command)); > + > + timeend = jiffies + msecs_to_jiffies(500); > + do { > + wait.opcode = MMC_SEND_STATUS; > + wait.arg = 1 << 16; > + wait.flags = MMC_RSP_SPI_R2 | MMC_RSP_R1 | MMC_CMD_AC; > + wait.data = NULL; > + cmdr = (wait.opcode << 8) | SDMMC_RESP_LEN_48 | > + SDMMC_CMD_CHK_RESP_CRC | SDMMC_CMD_IDX_CHK_ENABLE; > + > + mcq_writew(host, XFER_MODE_R, 0); > + mcq_writel(host, ARGUMENT_R, wait.arg); > + err = dw_mci_cqe_start_command(host, &wait, cmdr); > + if (err) { > + dw_mci_cqe_reset(host); > + break; > + } > + > + dw_mci_cqe_read_rsp(host, &wait, wait.resp); > + *status = wait.resp[0]; > + cur_state = R1_CURRENT_STATE(wait.resp[0]); > + err = -DW_MCI_NOT_READY; > + if (cur_state == R1_STATE_TRAN) { > + if (wait.resp[0] & R1_READY_FOR_DATA) { > + err = 0; > + break; > + } > + } > + } while (time_before(jiffies, timeend)); > + > + return err; > +} My first impression here was that this is a 500ms busy loop, which would have been crazy, but I now see that it actually blocks in the dw_mci_cqe_start_command() function I commented on above, so it probably does the right thing here. This would still be clearer if you changed the ..._start_command() function to just start the command without blocking and then had the waiting bit in the function called ...wait_status(). > +static void dw_mci_cqe_submit_data(struct dw_mci *host, struct > mmc_data *data) > +{ > + u32 dir = 0; > + > + host->sg = NULL; > + host->data = data; > + > + if (data->flags & MMC_DATA_READ) > + dir = DMA_FROM_DEVICE; > + else > + dir = DMA_TO_DEVICE; > + > + host->dma_nents = dma_map_sg(mmc_dev(host->slot->mmc), > + data->sg, data->sg_len, dir); > + host->sg = data->sg; > + > + host->using_dma = 1; > + > + dw_mci_cqe_submit_data_dma(host); > +} I think this is based on what the mmc layer expects you to do, but I would point out that dma_map_sg() is a fairly expensive operation on some chips, so it would be nice to be able to map the buffers in advance and not at the time when they are about to be submitted. > +static void dw_mci_cqe_request(struct mmc_host *mmc, struct > mmc_request *mrq) > +{ > + struct dw_mci_slot *slot = mmc_priv(mmc); > + struct dw_mci *host = slot->host; > + > + WARN_ON(slot->mrq); > + > + /* > + * The check for card presence and queueing of the request must be > + * atomic, otherwise the card could be removed in between and the > + * request wouldn't fail until another card was inserted. > + */ > + > + if (!dw_mci_cqe_get_cd(mmc)) { > + mrq->cmd->error = -ENOMEDIUM; > + mmc_request_done(mmc, mrq); > + return; > + } > + > + down_write(&host->cr_rw_sem); > + > + slot->mrq = mrq; > + host->mrq = mrq; > + > + dw_mci_cqe_start_request(host, slot); > + > + tasklet_schedule(&host->tasklet); > + > + up_write(&host->cr_rw_sem); > +} The tasklet function pretty much only calls mmc_request_done(), which I think can be called from any context, so why do you need to defer to a tasklet here? > + > +/* Register access macros */ > +#define mcq_readl(dev, reg) \ > + readl_relaxed((dev)->regs + SDMMC_##reg) > +#define mcq_writel(dev, reg, value) \ > + writel_relaxed((value), (dev)->regs + SDMMC_##reg) > + > +#define mcq_readw(dev, reg) \ > + readw_relaxed((dev)->regs + SDMMC_##reg) > +#define mcq_writew(dev, reg, value) \ > + writew_relaxed((value), (dev)->regs + SDMMC_##reg) > + > +#define mcq_readb(dev, reg) \ > + readb_relaxed((dev)->regs + SDMMC_##reg) > +#define mcq_writeb(dev, reg, value) \ > + writeb_relaxed((value), (dev)->regs + SDMMC_##reg) The use of _relaxed MMIO helpers is probably a bug here, as this skips the serialization against DMA and leads to incorrect data. Anything that is not performance critical or that interacts with DMA should use the normal readl/writel functions instead of the relaxed ones. > +#define dw_mci_get_int(dev) \ > + do { \ > + dev->normal_interrupt = mcq_readw(dev, NORMAL_INT_STAT_R); \ > + dev->error_interrupt = mcq_readw(dev, ERROR_INT_STAT_R); \ > + dev->auto_error_interrupt = mcq_readw(dev, AUTO_CMD_STAT_R); \ > + } while (0) You only have two callers of this, so I think this would be clearer when you open-code it. > +/*clear status register, we always keep the card interrupt*/ > +#define dw_mci_clr_int(dev) > \ > + do { \ > + mcq_writew(dev, ERROR_INT_STAT_R, mcq_readw(dev, ERROR_INT_STAT_R) & > 0xffff); \ > + mcq_writew(dev, NORMAL_INT_STAT_R, mcq_readw(dev, NORMAL_INT_STAT_R) > & 0xffff); \ > + } while (0) Same here. > +/* Board platform data */ > +struct dw_mci_board { > + unsigned int bus_hz; /* Clock speed at the cclk_in pad */ > + u32 caps; /* Capabilities */ > + u32 caps2; /* More capabilities */ > + u32 pm_caps; /* PM capabilities */ > + > + /* delay in mS before detecting cards after interrupt */ > + u32 detect_delay_ms; > + > + struct reset_control *rstc; > +}; This looks like an artifact from pre-DT days, just fold this into the dw_mci structure directly. Arnd