While introducing the pci_dma_prep_{single,sg}() interface, it occured to me that there are some subtle problems with the old interface and its implementation on non-cache-coherent systems. For example, a PCI driver might do the following on a buffer: pci_map_single(..., FROM_DEVICE) <perform DMA into buffer> pci_dma_sync_single(..., FROM_DEVICE) <read out of buffer> <put buffer back on queue and give bus address back to device - no explicit call to "return" the buffer to the device> <perform DMA again> pci_unmap_single(..., FROM_DEVICE) This happens in certain network drivers (tulip, pcnet32...) when determining whether the packet is small enough to make a copy (and leave the buffer on the RX ring). In the current linux-mips implementation, this has some subtle problems: pci_unmap_{single,sg}() is essentially a no-op. Thus, in the above example, a driver would break (stale cachelines from a previous pci_dma_sync_*/read would not be invalidated). One might argue that a cache invalidate should happen in the pci_unmap_single(). But then the other case (where a driver does a pci_map, DMAs, does a pci_unmap, and sends it up the stack) would require an additional cache flush/invalidate that is not needed at all. In fact, when using pci_dma_sync_{single,sg}() on a non-cache-coherent system, a cache flush/invalidate happens needlessly as the buffer had already been flushed in pci_map_{single,sg}(). These extra flushes do have significant performance hits, even if not noticeable through typical light use. As such, it seems that it makes sense to do cache flushes only in pci_map_{single,sg}() and pci_dma_prep_{single,sg}() (see Message-ID: <20020526000933.A8719@ayrnetworks.com>, [PATCH] DMA-mapping.txt) and mandate that pci_dma_prep_{single,sg}() be called when returning a "touched" buffer to the device for another DMA (between pci_map_* and pci_unmap_*). The cache flush in pci_dma_sync_{single,sg}() would be removed. The problem with this is that it will break some existing drivers that don't yet call pci_dma_prep_*() (e.g., tulip). Would it be possible to keep (but deprecate) pci_dma_sync_*() and make a similar call "pci_dma_release_{single,sg}()" which does the same thing as pci_dma_sync_*() while omitting the cache flushes? Thus, the above example would become: pci_map_single(..., FROM_DEVICE) [cache invalidate] <perform DMA into buffer> pci_dma_release_single(..., FROM_DEVICE) [no cache invalidate, but might perform PCI controller-specific operations to allow the CPU to view DMAed data] <read out of buffer> pci_dma_prep_single(..., FROM_DEVICE) [cache invalidate] <put buffer back on queue and give bus address back to device> <perform DMA again> pci_unmap_single(..., FROM_DEVICE) [no need for cache invalidate] This interface allows for: 1) A clean abstraction of what (the device or the driver) "owns" the buffer and can touch it. pci_map_* and pci_dma_prep_* "give" the buffer to the device, pci_unmap_* and pci_dma_release_* "take" the buffer from the device. Adds an entry point into platform-specific code for each of these transitions. 2) More optimal behavior in non-cache-coherent I/O situations; a buffer is never needlessly flushed/invalidated in the cache when the buffer is "owned" by the device. 3) Ability to re-use a buffer for PCI_DMA_TODEVICE operations without having to unmap/map between DMA transfers. The usage of the calls are the same, regardless of DMA direction. 4) The old interface (using pci_dma_sync_*()) will still remain for the time being, allowing for gradual migration to the new interface. Changes to Documentation/DMA-mapping.txt in vger are attached. Thanks, William --- Index: Documentation/DMA-mapping.txt =================================================================== RCS file: /vger/linux/Documentation/DMA-mapping.txt,v retrieving revision 1.17.2.2 diff -u -r1.17.2.2 DMA-mapping.txt --- Documentation/DMA-mapping.txt 5 Mar 2002 12:40:36 -0000 1.17.2.2 +++ Documentation/DMA-mapping.txt 5 Jun 2002 20:07:43 -0000 @@ -239,9 +239,9 @@ in order to get correct behavior on all platforms. -- Streaming DMA mappings which are usually mapped for one DMA transfer, - unmapped right after it (unless you use pci_dma_sync below) and for which - hardware can optimize for sequential accesses. +- Streaming DMA mappings which are usually mapped for one DMA + transfer, unmapped right after it (unless you use pci_dma_release + below) and for which hardware can optimize for sequential accesses. This of "streaming" as "asynchronous" or "outside the coherency domain". @@ -508,21 +508,46 @@ the data in between the DMA transfers, just map it with pci_map_{single,sg}, and after each DMA transfer call either: - pci_dma_sync_single(dev, dma_handle, size, direction); + pci_dma_release_single(dev, dma_handle, size, direction); or: - pci_dma_sync_sg(dev, sglist, nents, direction); + pci_dma_release_sg(dev, sglist, nents, direction); as appropriate. +Once you have touched the region (after having called pci_dma_release_*()), +and need to give it back to the device for another DMA transfer, call: + + pci_dma_prep_single(dev, dma_handle, size, direction); + +or: + + pci_dma_prep_sg(dev, sglist, nents, direction); + +This will prepare the buffer for another DMA transfer and synchronize any +CPU writes to it with the device's view if direction is set to +PCI_DMA_TODEVICE or PCI_DMA_BIDIRECTIONAL. + +Note: Prior to the introduction of pci_dma_prep_*, drivers accessing a +DMAed region after a pci_dma_sync_* (now obsolete) implicitly returned +"ownership" of the region to the device without having to expressly call a +routine to do so. In these cases, data is only read from the buffer, and +the CPU "view" of the region is synchronized again on the next call to +pci_dma_sync_*. For repeated DMA transfers to a device, pci_dma_prep_*() is +used to synchronize any changes in the region (made by the driver) with the +device. New drivers should use pci_dma_release_* in place of pci_dma_sync_* +and use pci_dma_prep_* when returning the buffer back to the device, +regardless of the DMA direction. Eventually, all calls to pci_dma_sync_* +will be eliminated. + After the last DMA transfer call one of the DMA unmap routines pci_unmap_{single,sg}. If you don't touch the data from the first pci_map_* -call till pci_unmap_*, then you don't have to call the pci_dma_sync_* -routines at all. +call till pci_unmap_*, then you don't have to call the pci_dma_release_* or +pci_dma_prep_* routines at all. Here is pseudo code which shows a situation in which you would need -to use the pci_dma_sync_*() interfaces. +to use the pci_dma_release_*() and pci_dma_prep_*() interfaces. my_card_setup_receive_buffer(struct my_card *cp, char *buffer, int len) { @@ -552,7 +577,7 @@ * the DMA transfer with the CPU first * so that we see updated contents. */ - pci_dma_sync_single(cp->pdev, cp->rx_dma, cp->rx_len, + pci_dma_release_single(cp->pdev, cp->rx_dma, cp->rx_len, PCI_DMA_FROMDEVICE); /* Now it is safe to examine the buffer. */ @@ -563,7 +588,11 @@ pass_to_upper_layers(cp->rx_buf); make_and_setup_new_rx_buf(cp); } else { - /* Just give the buffer back to the card. */ + /* Prepare the buffer for another DMA transfer, + * then give the buffer back to the card. + */ + pci_dma_prep_single(cp->pdev, cp->rx_dma, + cp->rx_len, PCI_DMA_FROMDEVICE); give_rx_buf_to_card(cp); } } @@ -671,12 +700,20 @@ When the DMA transfer is complete, invoke: - void pci_dac_dma_sync_single(struct pci_dev *pdev, + void pci_dac_dma_release_single(struct pci_dev *pdev, dma64_addr_t dma_addr, size_t len, int direction); This must be done before the CPU looks at the buffer again. -This interface behaves identically to pci_dma_sync_{single,sg}(). + +After calling pci_dac_dma_release_{single,sg}, and before returning the +buffer to the device for another DMA operation, invoke: + + void pci_dac_dma_prep_single(struct pci_dev *pdev, dma64_addr_t + dma_addr, size_t len, int direction); + +These two interfaces behave identically to pci_dma_release_{single,sg}() +and pci_dma_prep_{single,sg}(), respectively. If you need to get back to the PAGE/OFFSET tuple from a dma64_addr_t the following interfaces are provided: