On Thu, 3 Feb 2022 at 09:38, Arnd Bergmann <arnd@xxxxxxxxxx> wrote: > > From: Arnd Bergmann <arnd@xxxxxxxx> > > The sa1111 platform is one of the two remaining users of the old Arm > specific "dmabounce" code, which is an earlier implementation of the > generic swiotlb. > > Linus Walleij submitted a patch that removes dmabounce support from > the ixp4xx, and I had a look at the other user, which is the sa1111 > companion chip. > > Looking at how dmabounce is used, I could narrow it down to one driver > one three machines: > > - dmabounce is only initialized on assabet/neponset, jornada720 and > badge4, which are the platforms that have an sa1111 and support > DMA on it. > > - All three of these suffer from "erratum #7" that requires only > doing DMA to half the memory sections based on one of the address > lines, in addition, the neponset also can't DMA to the RAM that > is connected to sa1111 itself. > > - the pxa lubbock machine also has sa1111, but does not support DMA > on it and does not set dmabounce. > > - only the OHCI and audio devices on sa1111 support DMA, but as > there is no audio driver for this hardware, only OHCI remains. > > In the OHCI code, I noticed that two other platforms already have > a local bounce buffer support in the form of the "local_mem" > allocator. Specifically, TMIO and SM501 use this on a few other ARM > boards with 16KB or 128KB of local SRAM that can be accessed from the > OHCI and from the CPU. > > While this is not the same problem as on sa1111, I could not find a > reason why we can't re-use the existing implementation but replace the > physical SRAM address mapping with a locally allocated DMA buffer. > > There are two main downsides: > > - rather than using a dynamically sized pool, this buffer needs > to be allocated at probe time using a fixed size. Without > having any idea of what it should be, I picked a size of > 64KB, which is between what the other two OHCI front-ends use > in their SRAM. If anyone has a better idea what that size > is reasonable, this can be trivially changed. > I suppose this is a problem if the driver falls back to ordinary DRAM once the allocation runs out? > - Previously, only USB transfers to unaddressable memory needed > to go through the bounce buffer, now all of them do, which may > impact runtime performance for USB endpoints that do a lot of > transfers. > > On the upside, the local_mem support uses write-combining buffers, > which should be a bit faster for transfers to the device compared to > normal uncached coherent memory as used in dmabounce. > Talking from past experience using this trick on a NXP ARM9 SoC ~10 years ago, using on-chip SRAM for USB DMA likely results in a significant performance boost, even without write combining, although the exact scenario obviously matters. > Cc: Linus Walleij <linus.walleij@xxxxxxxxxx> > Cc: Russell King <linux@xxxxxxxxxxxxxxx> > Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx> > Cc: Laurentiu Tudor <laurentiu.tudor@xxxxxxx> > Cc: linux-usb@xxxxxxxxxxxxxxx > Acked-by: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> > Signed-off-by: Arnd Bergmann <arnd@xxxxxxxx> > --- > Changes in v2: > > - drop check for assabet, as bounce buffers are required on > all sa1100 machines > - select CONFIG_ZONE_DMA again > - update comments and changelog text based on discussion > --- > arch/arm/common/Kconfig | 2 +- > arch/arm/common/sa1111.c | 64 ---------------------------------- > drivers/usb/core/hcd.c | 17 +++++++-- > drivers/usb/host/ohci-sa1111.c | 25 +++++++++++++ > 4 files changed, 40 insertions(+), 68 deletions(-) > > diff --git a/arch/arm/common/Kconfig b/arch/arm/common/Kconfig > index c8e198631d41..bc158fd227e1 100644 > --- a/arch/arm/common/Kconfig > +++ b/arch/arm/common/Kconfig > @@ -1,7 +1,7 @@ > # SPDX-License-Identifier: GPL-2.0 > config SA1111 > bool > - select DMABOUNCE if !ARCH_PXA > + select ZONE_DMA if ARCH_SA1100 > > config DMABOUNCE > bool > diff --git a/arch/arm/common/sa1111.c b/arch/arm/common/sa1111.c > index 7df003b149c6..a00915883f78 100644 > --- a/arch/arm/common/sa1111.c > +++ b/arch/arm/common/sa1111.c > @@ -1391,70 +1391,9 @@ void sa1111_driver_unregister(struct sa1111_driver *driver) > } > EXPORT_SYMBOL(sa1111_driver_unregister); > > -#ifdef CONFIG_DMABOUNCE > -/* > - * According to the "Intel StrongARM SA-1111 Microprocessor Companion > - * Chip Specification Update" (June 2000), erratum #7, there is a > - * significant bug in the SA1111 SDRAM shared memory controller. If > - * an access to a region of memory above 1MB relative to the bank base, > - * it is important that address bit 10 _NOT_ be asserted. Depending > - * on the configuration of the RAM, bit 10 may correspond to one > - * of several different (processor-relative) address bits. > - * > - * This routine only identifies whether or not a given DMA address > - * is susceptible to the bug. > - * > - * This should only get called for sa1111_device types due to the > - * way we configure our device dma_masks. > - */ > -static int sa1111_needs_bounce(struct device *dev, dma_addr_t addr, size_t size) > -{ > - /* > - * Section 4.6 of the "Intel StrongARM SA-1111 Development Module > - * User's Guide" mentions that jumpers R51 and R52 control the > - * target of SA-1111 DMA (either SDRAM bank 0 on Assabet, or > - * SDRAM bank 1 on Neponset). The default configuration selects > - * Assabet, so any address in bank 1 is necessarily invalid. > - */ > - return (machine_is_assabet() || machine_is_pfs168()) && > - (addr >= 0xc8000000 || (addr + size) >= 0xc8000000); > -} > - > -static int sa1111_notifier_call(struct notifier_block *n, unsigned long action, > - void *data) > -{ > - struct sa1111_dev *dev = to_sa1111_device(data); > - > - switch (action) { > - case BUS_NOTIFY_ADD_DEVICE: > - if (dev->dev.dma_mask && dev->dma_mask < 0xffffffffUL) { > - int ret = dmabounce_register_dev(&dev->dev, 1024, 4096, > - sa1111_needs_bounce); > - if (ret) > - dev_err(&dev->dev, "failed to register with dmabounce: %d\n", ret); > - } > - break; > - > - case BUS_NOTIFY_DEL_DEVICE: > - if (dev->dev.dma_mask && dev->dma_mask < 0xffffffffUL) > - dmabounce_unregister_dev(&dev->dev); > - break; > - } > - return NOTIFY_OK; > -} > - > -static struct notifier_block sa1111_bus_notifier = { > - .notifier_call = sa1111_notifier_call, > -}; > -#endif > - > static int __init sa1111_init(void) > { > int ret = bus_register(&sa1111_bus_type); > -#ifdef CONFIG_DMABOUNCE > - if (ret == 0) > - bus_register_notifier(&sa1111_bus_type, &sa1111_bus_notifier); > -#endif > if (ret == 0) > platform_driver_register(&sa1111_device_driver); > return ret; > @@ -1463,9 +1402,6 @@ static int __init sa1111_init(void) > static void __exit sa1111_exit(void) > { > platform_driver_unregister(&sa1111_device_driver); > -#ifdef CONFIG_DMABOUNCE > - bus_unregister_notifier(&sa1111_bus_type, &sa1111_bus_notifier); > -#endif > bus_unregister(&sa1111_bus_type); > } > > diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c > index 3c7c64ff3c0a..8417baedc89c 100644 > --- a/drivers/usb/core/hcd.c > +++ b/drivers/usb/core/hcd.c > @@ -1260,7 +1260,8 @@ void usb_hcd_unlink_urb_from_ep(struct usb_hcd *hcd, struct urb *urb) > EXPORT_SYMBOL_GPL(usb_hcd_unlink_urb_from_ep); > > /* > - * Some usb host controllers can only perform dma using a small SRAM area. > + * Some usb host controllers can only perform dma using a small SRAM area, > + * or have restrictions on addressable DRAM. > * The usb core itself is however optimized for host controllers that can dma > * using regular system memory - like pci devices doing bus mastering. > * > @@ -3095,8 +3096,18 @@ int usb_hcd_setup_local_mem(struct usb_hcd *hcd, phys_addr_t phys_addr, > if (IS_ERR(hcd->localmem_pool)) > return PTR_ERR(hcd->localmem_pool); > > - local_mem = devm_memremap(hcd->self.sysdev, phys_addr, > - size, MEMREMAP_WC); > + /* > + * if a physical SRAM address was passed, map it, otherwise > + * allocate system memory as a buffer. > + */ > + if (phys_addr) > + local_mem = devm_memremap(hcd->self.sysdev, phys_addr, > + size, MEMREMAP_WC); > + else > + local_mem = dmam_alloc_attrs(hcd->self.sysdev, size, &dma, > + GFP_KERNEL, > + DMA_ATTR_WRITE_COMBINE); > + > if (IS_ERR(local_mem)) > return PTR_ERR(local_mem); > > diff --git a/drivers/usb/host/ohci-sa1111.c b/drivers/usb/host/ohci-sa1111.c > index 137f66f6977f..0da2badf0658 100644 > --- a/drivers/usb/host/ohci-sa1111.c > +++ b/drivers/usb/host/ohci-sa1111.c > @@ -206,6 +206,31 @@ static int ohci_hcd_sa1111_probe(struct sa1111_dev *dev) > goto err1; > } > > + /* > + * According to the "Intel StrongARM SA-1111 Microprocessor Companion > + * Chip Specification Update" (June 2000), erratum #7, there is a > + * significant bug in the SA1111 SDRAM shared memory controller. If > + * an access to a region of memory above 1MB relative to the bank base, > + * it is important that address bit 10 _NOT_ be asserted. Depending > + * on the configuration of the RAM, bit 10 may correspond to one > + * of several different (processor-relative) address bits. > + * > + * Section 4.6 of the "Intel StrongARM SA-1111 Development Module > + * User's Guide" mentions that jumpers R51 and R52 control the > + * target of SA-1111 DMA (either SDRAM bank 0 on Assabet, or > + * SDRAM bank 1 on Neponset). The default configuration selects > + * Assabet, so any address in bank 1 is necessarily invalid. > + * > + * As a workaround, use a bounce buffer in addressable memory > + * as local_mem, relying on ZONE_DMA to provide an area that > + * fits within the above constraints. > + * > + * SZ_64K is an estimate for what size this might need. > + */ > + ret = usb_hcd_setup_local_mem(hcd, 0, 0, SZ_64K); > + if (ret) > + goto err1; > + > if (!request_mem_region(hcd->rsrc_start, hcd->rsrc_len, hcd_name)) { > dev_dbg(&dev->dev, "request_mem_region failed\n"); > ret = -EBUSY; > -- > 2.29.2 > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@xxxxxxxxxxxxxxxxxxx > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel