> -----Original Message----- > From: linux-mmc-owner@xxxxxxxxxxxxxxx > [mailto:linux-mmc-owner@xxxxxxxxxxxxxxx] On Behalf Of Baolin Wang > Sent: 2020年2月19日 9:35 > To: Ulf Hansson <ulf.hansson@xxxxxxxxxx> > Cc: Adrian Hunter <adrian.hunter@xxxxxxxxx>; Asutosh Das > <asutoshd@xxxxxxxxxxxxxx>; Orson Zhai <orsonzhai@xxxxxxxxx>; Chunyan > Zhang <zhang.lyra@xxxxxxxxx>; Arnd Bergmann <arnd@xxxxxxxx>; Linus > Walleij <linus.walleij@xxxxxxxxxx>; Baolin Wang <baolin.wang@xxxxxxxxxx>; > linux-mmc@xxxxxxxxxxxxxxx; Linux Kernel Mailing List > <linux-kernel@xxxxxxxxxxxxxxx> > Subject: Re: [PATCH v9 0/5] Add MMC software queue support > > On Wed, Feb 19, 2020 at 7:38 AM Ulf Hansson <ulf.hansson@xxxxxxxxxx> > wrote: > > > > On Wed, 12 Feb 2020 at 05:14, Baolin Wang <baolin.wang7@xxxxxxxxx> > wrote: > > > > > > Hi All, > > > > > > Now the MMC read/write stack will always wait for previous request > > > is completed by mmc_blk_rw_wait(), before sending a new request to > > > hardware, or queue a work to complete request, that will bring > > > context switching overhead, especially for high I/O per second > > > rates, to affect the IO performance. > > > > > > Thus this patch set will introduce the MMC software command queue > > > support based on command queue engine's interfaces, and set the > > > queue depth as 64 to allow more requests can be be prepared, merged > > > and inserted into IO scheduler, but we only allow 2 requests in > > > flight, that is enough to let the irq handler always trigger the > > > next request without a context switch, as well as avoiding a long latency. > > > > > > Moreover we can expand the MMC software queue interface to support > > > MMC packed request or packed command instead of adding new > > > interfaces, according to previosus discussion. > > > > > > Below are some comparison data with fio tool. The fio command I used > > > is like below with changing the '--rw' parameter and enabling the > > > direct IO flag to measure the actual hardware transfer speed in 4K block > size. > > > > > > ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read > > > --bs=4K --size=1G --group_reporting --numjobs=20 --name=test_read > > > > > > My eMMC card working at HS400 Enhanced strobe mode: > > > [ 2.229856] mmc0: new HS400 Enhanced strobe MMC card at address > 0001 > > > [ 2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB > > > [ 2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB > > > [ 2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB > > > [ 2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB, > chardev (248:0) > > > > > > 1. Without MMC software queue > > > I tested 5 times for each case and output a average speed. > > > > > > 1) Sequential read: > > > Speed: 59.4MiB/s, 63.4MiB/s, 57.5MiB/s, 57.2MiB/s, 60.8MiB/s Average > > > speed: 59.66MiB/s > > > > > > 2) Random read: > > > Speed: 26.9MiB/s, 26.9MiB/s, 27.1MiB/s, 27.1MiB/s, 27.2MiB/s Average > > > speed: 27.04MiB/s > > > > > > 3) Sequential write: > > > Speed: 71.6MiB/s, 72.5MiB/s, 72.2MiB/s, 64.6MiB/s, 67.5MiB/s Average > > > speed: 69.68MiB/s > > > > > > 4) Random write: > > > Speed: 36.3MiB/s, 35.4MiB/s, 38.6MiB/s, 34MiB/s, 35.5MiB/s Average > > > speed: 35.96MiB/s > > > > > > 2. With MMC software queue > > > I tested 5 times for each case and output a average speed. > > > > > > 1) Sequential read: > > > Speed: 59.2MiB/s, 60.4MiB/s, 63.6MiB/s, 60.3MiB/s, 59.9MiB/s Average > > > speed: 60.68MiB/s > > > > > > 2) Random read: > > > Speed: 31.3MiB/s, 31.4MiB/s, 31.5MiB/s, 31.3MiB/s, 31.3MiB/s Average > > > speed: 31.36MiB/s > > > > > > 3) Sequential write: > > > Speed: 71MiB/s, 71.8MiB/s, 72.3MiB/s, 72.2MiB/s, 71MiB/s Average > > > speed: 71.66MiB/s > > > > > > 4) Random write: > > > Speed: 68.9MiB/s, 68.7MiB/s, 68.8MiB/s, 68.6MiB/s, 68.8MiB/s Average > > > speed: 68.76MiB/s > > > > > > Form above data, we can see the MMC software queue can help to > > > improve some performance obviously for random read and write, though > > > no obvious improvement for sequential read and write. > > > > > > Any comments are welcome. Thanks a lot. > > > Hi Baolin, I refer to your code, and add the software queue support on i.MX based on the Linux next-20200602, but unfortunately, I see an obvious performance drop when change to use software queue. I test on our imx850-evk board, with eMMC soldered. >From the result listing below, only random write has a little performance improve, for others, seems performance drop a lot. I noticed that, this software queue need no-removable card, any other limitation? For host? >From the code logic, software queue complete the request in irq handler, seems no other change, I do not figure out why this will trigger a performance drop on my platform. Any comment would be appreciate! Without software queue, normal read/write method: Sequential read: 56MB/s Random read: 23.5MB/s Sequential write: 43.7MB/s Random write: 19MB/s With mmc software queue: Sequential read: 33.5MB/s Random read: 18.7 MB/s Sequential write: 37.7MB/s Random write: 19.8MB/s Here, I also list my change code to support software queue diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig index eb85237bf2d6..996b8cc5c381 100644 --- a/drivers/mmc/host/Kconfig +++ b/drivers/mmc/host/Kconfig @@ -254,6 +254,7 @@ config MMC_SDHCI_ESDHC_IMX depends on MMC_SDHCI_PLTFM select MMC_SDHCI_IO_ACCESSORS select MMC_CQHCI + select MMC_HSQ help This selects the Freescale eSDHC/uSDHC controller support found on i.MX25, i.MX35 i.MX5x and i.MX6x. diff --git a/drivers/mmc/host/sdhci-esdhc-imx.c b/drivers/mmc/host/sdhci-esdhc-imx.c index 1d7f84b23a22..6f163695b08d 100644 --- a/drivers/mmc/host/sdhci-esdhc-imx.c +++ b/drivers/mmc/host/sdhci-esdhc-imx.c @@ -29,6 +29,7 @@ #include "sdhci-pltfm.h" #include "sdhci-esdhc.h" #include "cqhci.h" +#include "mmc_hsq.h" #define ESDHC_SYS_CTRL_DTOCV_MASK 0x0f #define ESDHC_CTRL_D3CD 0x08 @@ -1220,6 +1221,15 @@ static u32 esdhc_cqhci_irq(struct sdhci_host *host, u32 intmask) return 0; } +static void esdhc_request_done(struct sdhci_host *host, struct mmc_request *mrq) +{ + /* Validate if the request was from software queue firstly. */ + if (mmc_hsq_finalize_request(host->mmc, mrq)) + return; + + mmc_request_done(host->mmc, mrq); +} + static struct sdhci_ops sdhci_esdhc_ops = { .read_l = esdhc_readl_le, .read_w = esdhc_readw_le, @@ -1237,6 +1247,7 @@ static struct sdhci_ops sdhci_esdhc_ops = { .set_uhs_signaling = esdhc_set_uhs_signaling, .reset = esdhc_reset, .irq = esdhc_cqhci_irq, + .request_done = esdhc_request_done, }; static const struct sdhci_pltfm_data sdhci_esdhc_imx_pdata = { @@ -1301,6 +1312,19 @@ static void sdhci_esdhc_imx_hwinit(struct sdhci_host *host) writel(tmp, host->ioaddr + ESDHC_VEND_SPEC2); host->quirks &= ~SDHCI_QUIRK_NO_BUSY_IRQ; + + /* + * On i.MX8MM, we are running Dual Linux OS, with 1st Linux using SD Card + * as rootfs storage, 2nd Linux using eMMC as rootfs storage. We let the + * the 1st linux configure power/clock for the 2nd Linux. + * + * When the 2nd Linux is booting into rootfs stage, we let the 1st Linux + * to destroy the 2nd linux, then restart the 2nd linux, we met SDHCI dump. + * After we clear the pending interrupt and halt CQCTL, issue gone. + */ + tmp = cqhci_readl(cq_host, CQHCI_IS); + cqhci_writel(cq_host, tmp, CQHCI_IS); + cqhci_writel(cq_host, CQHCI_HALT, CQHCI_CTL); } if (imx_data->socdata->flags & ESDHC_FLAG_STD_TUNING) { @@ -1351,9 +1375,6 @@ static void sdhci_esdhc_imx_hwinit(struct sdhci_host *host) * After we clear the pending interrupt and halt CQCTL, issue gone. */ if (cq_host) { - tmp = cqhci_readl(cq_host, CQHCI_IS); - cqhci_writel(cq_host, tmp, CQHCI_IS); - cqhci_writel(cq_host, CQHCI_HALT, CQHCI_CTL); } } } @@ -1555,6 +1576,7 @@ static int sdhci_esdhc_imx_probe(struct platform_device *pdev) struct sdhci_pltfm_host *pltfm_host; struct sdhci_host *host; struct cqhci_host *cq_host; + struct mmc_hsq *hsq; int err; struct pltfm_imx_data *imx_data; @@ -1664,6 +1686,16 @@ static int sdhci_esdhc_imx_probe(struct platform_device *pdev) err = cqhci_init(cq_host, host->mmc, false); if (err) goto disable_ahb_clk; + } else if (esdhc_is_usdhc(imx_data)) { + hsq = devm_kzalloc(&pdev->dev, sizeof(*hsq), GFP_KERNEL); + if (!hsq) { + err = -ENOMEM; + goto disable_ahb_clk; + } + + err = mmc_hsq_init(hsq, host->mmc); + if (err) + goto disable_ahb_clk; } if (of_id) @@ -1673,6 +1705,11 @@ static int sdhci_esdhc_imx_probe(struct platform_device *pdev) if (err) goto disable_ahb_clk; + if (!mmc_card_is_removable(host->mmc)) + host->mmc_host_ops.request_atomic = sdhci_request_atomic; + else + host->always_defer_done = true; + sdhci_esdhc_imx_hwinit(host); err = sdhci_add_host(host); @@ -1737,6 +1774,8 @@ static int sdhci_esdhc_suspend(struct device *dev) ret = cqhci_suspend(host->mmc); if (ret) return ret; + } else if (esdhc_is_usdhc(imx_data)) { + mmc_hsq_suspend(host->mmc); } if ((imx_data->socdata->flags & ESDHC_FLAG_STATE_LOST_IN_LPMODE) && @@ -1764,6 +1803,8 @@ static int sdhci_esdhc_suspend(struct device *dev) static int sdhci_esdhc_resume(struct device *dev) { struct sdhci_host *host = dev_get_drvdata(dev); + struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host); + struct pltfm_imx_data *imx_data = sdhci_pltfm_priv(pltfm_host); int ret; ret = pinctrl_pm_select_default_state(dev); @@ -1777,8 +1818,11 @@ static int sdhci_esdhc_resume(struct device *dev) if (ret) return ret; - if (host->mmc->caps2 & MMC_CAP2_CQE) + if (host->mmc->caps2 & MMC_CAP2_CQE) { ret = cqhci_resume(host->mmc); + } else if (esdhc_is_usdhc(imx_data)) { + mmc_hsq_resume(host->mmc); + } if (!ret) ret = mmc_gpio_set_cd_wake(host->mmc, false); @@ -1799,6 +1843,8 @@ static int sdhci_esdhc_runtime_suspend(struct device *dev) ret = cqhci_suspend(host->mmc); if (ret) return ret; + } else if (esdhc_is_usdhc(imx_data)) { + mmc_hsq_suspend(host->mmc); } ret = sdhci_runtime_suspend_host(host); @@ -1851,8 +1897,11 @@ static int sdhci_esdhc_runtime_resume(struct device *dev) if (err) goto disable_ipg_clk; - if (host->mmc->caps2 & MMC_CAP2_CQE) + if (host->mmc->caps2 & MMC_CAP2_CQE) { err = cqhci_resume(host->mmc); + } else if (esdhc_is_usdhc(imx_data)) { + mmc_hsq_resume(host->mmc); + } return err; > > > Changes from v8: > > > - Add more description in the commit message. > > > - Optimize the failure log when calling cqe_enable(). > > > > > > Changes from v7: > > > - Add reviewed tag from Arnd. > > > - Use the 'hsq' acronym for varibles and functions in the core layer. > > > - Check the 'card->ext_csd.cmdq_en' in cqhci.c to make sure the CQE > > > can work normally. > > > - Add a new patch to enable the host software queue for the SD card. > > > - Use the default MMC queue depth for host software queue. > > > > > > Changes from v6: > > > - Change the patch order and set host->always_defer_done = true for > > > the Spreadtrum host driver. > > > > > > Changes from v5: > > > - Modify the condition of defering to complete request suggested by > Adrian. > > > > > > Changes from v4: > > > - Add a seperate patch to introduce a variable to defer to complete > > > data requests for some host drivers, when using host software queue. > > > > > > Changes from v3: > > > - Use host software queue instead of sqhci. > > > - Fix random config building issue. > > > - Change queue depth to 32, but still only allow 2 requests in flight. > > > - Update the testing data. > > > > > > Changes from v2: > > > - Remove reference to 'struct cqhci_host' and 'struct cqhci_slot', > > > instead adding 'struct sqhci_host', which is only used by software queue. > > > > > > Changes from v1: > > > - Add request_done ops for sdhci_ops. > > > - Replace virtual command queue with software queue for functions > > > and variables. > > > - Rename the software queue file and add sqhci.h header file. > > > > > > Baolin Wang (5): > > > mmc: Add MMC host software queue support > > > mmc: core: Enable the MMC host software queue for the SD card > > > mmc: host: sdhci: Add request_done ops for struct sdhci_ops > > > mmc: host: sdhci: Add a variable to defer to complete requests if > > > needed > > > mmc: host: sdhci-sprd: Add software queue support > > > > > > drivers/mmc/core/block.c | 61 ++++++++ > > > drivers/mmc/core/mmc.c | 18 ++- > > > drivers/mmc/core/queue.c | 22 ++- > > > drivers/mmc/core/sd.c | 10 ++ > > > drivers/mmc/host/Kconfig | 8 + > > > drivers/mmc/host/Makefile | 1 + > > > drivers/mmc/host/cqhci.c | 8 +- > > > drivers/mmc/host/mmc_hsq.c | 343 > +++++++++++++++++++++++++++++++++++++++++ > > > drivers/mmc/host/mmc_hsq.h | 30 ++++ > > > drivers/mmc/host/sdhci-sprd.c | 28 ++++ > > > drivers/mmc/host/sdhci.c | 14 +- > > > drivers/mmc/host/sdhci.h | 3 + > > > include/linux/mmc/host.h | 3 + > > > 13 files changed, 534 insertions(+), 15 deletions(-) create mode > > > 100644 drivers/mmc/host/mmc_hsq.c create mode 100644 > > > drivers/mmc/host/mmc_hsq.h > > > > > > -- > > > 1.7.9.5 > > > > > > > Applied for next, thanks! Also, thanks for your patience while moving > > forward during the reviews! > > I am very appreciated for you and Arnd's good sugestion when introducing the > hsq. > > > > > Note, I did some amending of patch1 to resolve some checkpatch > > warnings. SPDX licence and Kconfig help texts, please have a look and > > tell if there are something that doesn't look good. > > Thanks for your help and looks good to me.