On 12/01/18 16:19, Linus Walleij wrote: > The bounce buffer is gone from the MMC core, and now we found out > that there are some (crippled) i.MX boards out there that have broken > ADMA (cannot do scatter-gather), and broken PIO so they must use > SDMA. Closer examination shows a less significant slowdown also on > SDMA-only capable Laptop hosts. > > SDMA sets down the number of segments to one, so that each segment > gets turned into a singular request that ping-pongs to the block > layer before the next request/segment is issued. > > Apparently it happens a lot that the block layer send requests > that include a lot of physically discontigous segments. My guess > is that this phenomenon is coming from the file system. > > These devices that cannot handle scatterlists in hardware can see > major benefits from a DMA-contigous bounce buffer. > > This patch accumulates those fragmented scatterlists in a physically > contigous bounce buffer so that we can issue bigger DMA data chunks > to/from the card. > > When tested with thise PCI-integrated host (1217:8221) that > only supports SDMA: > 0b:00.0 SD Host controller: O2 Micro, Inc. OZ600FJ0/OZ900FJ0/OZ600FJS > SD/MMC Card Reader Controller (rev 05) > This patch gave ~1Mbyte/s improved throughput on large reads and > writes when testing using iozone than without the patch. > > On the i.MX SDHCI controllers on the crippled i.MX 25 and i.MX 35 > the patch restores the performance to what it was before we removed > the bounce buffers, and then some: performance is better than ever > because we now allocate a bounce buffer the size of the maximum > single request the SDMA engine can handle. On the PCI laptop this > is 256K, whereas with the old bounce buffer code it was 64K max. > > Cc: Benjamin Beckmeyer <beckmeyer.b@xxxxxxxxx> > Cc: Pierre Ossman <pierre@xxxxxxxxx> > Cc: Benoît Thébaudeau <benoit@xxxxxxxxxxx> > Cc: Fabio Estevam <fabio.estevam@xxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > Fixes: de3ee99b097d ("mmc: Delete bounce buffer handling") > Signed-off-by: Linus Walleij <linus.walleij@xxxxxxxxxx> Thanks for doing this. A few comments below. > --- > ChangeLog v4->v5: > - Go back to dma_alloc_coherent() as this apparently works better. > - Keep the other changes, cap for 64KB, fall back to single segments. > - Requesting a test of this on i.MX. (Sorry Benjamin.) > ChangeLog v3->v4: > - Cap the bounce buffer to 64KB instead of the biggest segment > as we experience diminishing returns with buffers > 64KB. > - Instead of using dma_alloc_coherent(), use good old devm_kmalloc() > and issue dma_sync_single_for*() to explicitly switch > ownership between CPU and the device. This way we exercise the > cache better and may consume less CPU. > - Bail out with single segments if we cannot allocate a bounce > buffer. > - Tested on the PCI SDHCI on my laptop: requesting a new test > on i.MX from Benjamin. (Please!) > ChangeLog v2->v3: > - Rewrite the commit message a bit > - Add Benjamin's Tested-by > - Add Fixes and stable tags > ChangeLog v1->v2: > - Skip the remapping and fiddling with the buffer, instead use > dma_alloc_coherent() and use a simple, coherent bounce buffer. > - Couple kernel messages to ->parent of the mmc_host as it relates > to the hardware characteristics. > --- > drivers/mmc/host/sdhci.c | 105 +++++++++++++++++++++++++++++++++++++++++++---- > drivers/mmc/host/sdhci.h | 3 ++ > 2 files changed, 100 insertions(+), 8 deletions(-) > > diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c > index e9290a3439d5..4e594d5e3185 100644 > --- a/drivers/mmc/host/sdhci.c > +++ b/drivers/mmc/host/sdhci.c > @@ -19,6 +19,7 @@ > #include <linux/io.h> > #include <linux/module.h> > #include <linux/dma-mapping.h> > +#include <linux/sizes.h> > #include <linux/slab.h> > #include <linux/scatterlist.h> > #include <linux/swiotlb.h> > @@ -502,8 +503,22 @@ static int sdhci_pre_dma_transfer(struct sdhci_host *host, > if (data->host_cookie == COOKIE_PRE_MAPPED) > return data->sg_count; > > - sg_count = dma_map_sg(mmc_dev(host->mmc), data->sg, data->sg_len, > - mmc_get_dma_dir(data)); > + /* Bounce write requests to the bounce buffer */ > + if (host->bounce_buffer) { > + if (mmc_get_dma_dir(data) == DMA_TO_DEVICE) { > + /* Copy the data to the bounce buffer */ > + sg_copy_to_buffer(data->sg, data->sg_len, > + host->bounce_buffer, > + host->bounce_buffer_size); Shouldn't this be the data size instead of host->bounce_buffer_size > + } > + /* Just a dummy value */ > + sg_count = 1; > + } else { > + /* Just access the data directly from memory */ > + sg_count = dma_map_sg(mmc_dev(host->mmc), data->sg, > + data->sg_len, > + mmc_get_dma_dir(data)); > + } > > if (sg_count == 0) > return -ENOSPC; > @@ -858,8 +873,13 @@ static void sdhci_prepare_data(struct sdhci_host *host, struct mmc_command *cmd) > SDHCI_ADMA_ADDRESS_HI); > } else { > WARN_ON(sg_cnt != 1); > - sdhci_writel(host, sg_dma_address(data->sg), > - SDHCI_DMA_ADDRESS); > + /* Bounce buffer goes to work */ > + if (host->bounce_buffer) > + sdhci_writel(host, host->bounce_addr, > + SDHCI_DMA_ADDRESS); > + else > + sdhci_writel(host, sg_dma_address(data->sg), > + SDHCI_DMA_ADDRESS); > } > } > > @@ -2248,7 +2268,12 @@ static void sdhci_pre_req(struct mmc_host *mmc, struct mmc_request *mrq) > > mrq->data->host_cookie = COOKIE_UNMAPPED; > > - if (host->flags & SDHCI_REQ_USE_DMA) > + /* > + * No pre-mapping in the pre hook if we're using the bounce buffer, > + * for that we would need two bounce buffers since one buffer is > + * in flight when this is getting called. > + */ > + if (host->flags & SDHCI_REQ_USE_DMA && !host->bounce_buffer) > sdhci_pre_dma_transfer(host, mrq->data, COOKIE_PRE_MAPPED); > } > > @@ -2352,8 +2377,23 @@ static bool sdhci_request_done(struct sdhci_host *host) > struct mmc_data *data = mrq->data; > > if (data && data->host_cookie == COOKIE_MAPPED) { > - dma_unmap_sg(mmc_dev(host->mmc), data->sg, data->sg_len, > - mmc_get_dma_dir(data)); > + if (host->bounce_buffer) { > + /* > + * On reads, copy the bounced data into the > + * sglist > + */ > + if (mmc_get_dma_dir(data) == DMA_FROM_DEVICE) { > + sg_copy_from_buffer(data->sg, > + data->sg_len, > + host->bounce_buffer, > + host->bounce_buffer_size); And this should be data->bytes_xfered > + } > + } else { > + /* Unmap the raw data */ > + dma_unmap_sg(mmc_dev(host->mmc), data->sg, > + data->sg_len, > + mmc_get_dma_dir(data)); > + } > data->host_cookie = COOKIE_UNMAPPED; > } > } > @@ -2636,7 +2676,12 @@ static void sdhci_data_irq(struct sdhci_host *host, u32 intmask) > */ > if (intmask & SDHCI_INT_DMA_END) { > u32 dmastart, dmanow; > - dmastart = sg_dma_address(host->data->sg); > + > + if (host->bounce_buffer) > + dmastart = host->bounce_addr; > + else > + dmastart = sg_dma_address(host->data->sg); How about making a function for this e.g. called sdhci_sdma_address() > + > dmanow = dmastart + host->data->bytes_xfered; > /* > * Force update to the next DMA block boundary. > @@ -3713,6 +3758,47 @@ int sdhci_setup_host(struct sdhci_host *host) > */ > mmc->max_blk_count = (host->quirks & SDHCI_QUIRK_NO_MULTIBLOCK) ? 1 : 65535; > > + if (mmc->max_segs == 1) { Please make the bounce buffer allocation a separate function and put it at the start of __sdhci_add_host() > + unsigned int max_blocks; Perhaps drop the max_blocks local variable, and just set mmc->max_segs = host->bounce_buffer_size / 512 > + unsigned int max_seg_size; It would read better if you called this bounce_size or buffer_size > + > + max_seg_size = SZ_64K; > + if (mmc->max_req_size < max_seg_size) > + max_seg_size = mmc->max_req_size; Maybe make the bounce buffer size calculation a separate function and follow the same logic as the old mmc_queue_calc_bouncesz() > + max_blocks = max_seg_size / 512; > + dev_info(mmc->parent, > + "host only supports SDMA, activate bounce buffer\n"); There are other prints below so this seems redundant > + > + /* > + * When we just support one segment, we can get significant > + * speedup by the help of a bounce buffer to group scattered > + * reads/writes together. > + */ > + host->bounce_buffer = dma_alloc_coherent(mmc->parent, > + max_seg_size, > + &host->bounce_addr, > + GFP_KERNEL); > + if (!host->bounce_buffer) { > + dev_err(mmc->parent, > + "failed to allocate %u bytes for bounce buffer, falling back to single segments\n", > + max_seg_size); Please use pr_err and mmc_hostname(mmc) > + /* > + * Exiting with zero here makes sure we proceed with > + * mmc->max_segs == 1. > + */ > + return 0; > + } > + host->bounce_buffer_size = max_seg_size; > + > + /* Lie about this since we're bouncing */ > + mmc->max_segs = max_blocks; > + mmc->max_seg_size = max_seg_size; Don't you need to set max_req_size also > + > + dev_info(mmc->parent, > + "bounce buffer: bounce up to %u segments into one, max segment size %u bytes\n", > + max_blocks, max_seg_size); Please use pr_info and mmc_hostname(mmc) > + } > + > return 0; > > unreg: > @@ -3743,6 +3829,9 @@ void sdhci_cleanup_host(struct sdhci_host *host) > host->align_addr); > host->adma_table = NULL; > host->align_buffer = NULL; Please make freeing the bounce buffer a separate function > + if (host->bounce_buffer) > + dma_free_coherent(mmc->parent, host->bounce_buffer_size, > + host->bounce_buffer, host->bounce_addr); > } > EXPORT_SYMBOL_GPL(sdhci_cleanup_host); > > diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h > index 54bc444c317f..865e09618d22 100644 > --- a/drivers/mmc/host/sdhci.h > +++ b/drivers/mmc/host/sdhci.h > @@ -440,6 +440,9 @@ struct sdhci_host { > > int irq; /* Device IRQ */ > void __iomem *ioaddr; /* Mapped address */ > + char *bounce_buffer; /* For packing SDMA reads/writes */ > + dma_addr_t bounce_addr; > + size_t bounce_buffer_size; SDMA doesn't support large transfers so unsigned int is better that size_t here. > > const struct sdhci_ops *ops; /* Low level hw interface */ > > -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html