Hi Niklas, On 13-12-24, 17:59, Niklas Cassel wrote: > Hello Vinod, > > I am a bit confused about the usage of the dmaengine API, and I hope that you > could help make me slightly less confused :) Sure thing! > If you look at the nvmet_pciep_epf_dma_transfer() function below, it takes a > mutex around the dmaengine_slave_config(), dmaengine_prep_slave_single(), > dmaengine_submit(), dma_sync_wait(), and dmaengine_terminate_sync() calls. > > I really wish that we would remove this mutex, to get better performance. > > > If I look at e.g. the drivers/dma/dw-edma/dw-edma-core.c driver, I can see > that dmaengine_prep_slave_single() (which will call > device_prep_slave_sg(.., .., 1, .., .., ..)) allocates a new > dma_async_tx_descriptor for each function call. > > I can see that device_prep_slave_sg() (dw_edma_device_prep_slave_sg()) will > call dw_edma_device_transfer() which will call vchan_tx_prep(), which adds > the descriptor to the tail of a list. > > I can also see that dw_edma_done_interrupt() will automatically start the > transfer of the next descriptor (using vchan_next_desc()). > > So this looks like it is supposed to be asynchronous... however, if we simply > remove the mutex, we get IOMMU errors, most likely because the DMA writes to > an incorrect address. > > It looks like this is because dmaengine_prep_slave_single() really requires > dmaengine_slave_config() for each transfer. (Since we are supplying a src_addr > in the sconf that we are supplying to dmaengine_slave_config().) > > (i.e. we can't call dmaengine_slave_config() while a DMA transfer is active.) > > So while this API is supposed to be async, to me it looks like it can only > be used in a synchronous manner... But that seems like a really weird design. > > Am I missing something obvious here? Yes, I feel nvme being treated as slave transfer, which it might not be. This API was designed for peripherals like i2c/spi etc where we have a hardware address to read/write to. So the dma_slave_config would pass on the transfer details for the peripheral like address, width of fifo, depth etc and these are setup config, so call once for a channel and then prepare the descriptor, submit... and repeat of prepare and submit ... I suspect since you are passing an address which keep changing in the dma_slave_config, you need to guard that and prep_slave_single() call, as while preparing the descriptor driver would lookup what was setup for the configuration. I suggest then use the prep_memcpy() API instead and pass on source and destination, no need to lock the calls... > If the dmaengine_prep_slave_single() instead took a sconfig as a parameter, > then it seems like it would be possible to use this API asynchronously. > > There does seem to be other drivers holding a mutex around > dmaengine_slave_config() + dmaengine_prep_slave_single()... but it feels like > this should be avoidable (at least for dw-edma), if > dmaengine_prep_slave_single() simply took an sconf. > > What am I missing? :) > > > Kind regards, > Niklas > > > On Thu, Dec 12, 2024 at 08:34:39PM +0900, Damien Le Moal wrote: > > +static int nvmet_pciep_epf_dma_transfer(struct nvmet_pciep_epf *nvme_epf, > > + struct nvmet_pciep_segment *seg, enum dma_data_direction dir) > > +{ > > + struct pci_epf *epf = nvme_epf->epf; > > + struct dma_async_tx_descriptor *desc; > > + struct dma_slave_config sconf = {}; > > + struct device *dev = &epf->dev; > > + struct device *dma_dev; > > + struct dma_chan *chan; > > + dma_cookie_t cookie; > > + dma_addr_t dma_addr; > > + struct mutex *lock; > > + int ret; > > + > > + switch (dir) { > > + case DMA_FROM_DEVICE: > > + lock = &nvme_epf->dma_rx_lock; > > + chan = nvme_epf->dma_rx_chan; > > + sconf.direction = DMA_DEV_TO_MEM; > > + sconf.src_addr = seg->pci_addr; > > + break; > > + case DMA_TO_DEVICE: > > + lock = &nvme_epf->dma_tx_lock; > > + chan = nvme_epf->dma_tx_chan; > > + sconf.direction = DMA_MEM_TO_DEV; > > + sconf.dst_addr = seg->pci_addr; > > + break; > > + default: > > + return -EINVAL; > > + } > > + > > + mutex_lock(lock); > > + > > + dma_dev = dmaengine_get_dma_device(chan); > > + dma_addr = dma_map_single(dma_dev, seg->buf, seg->length, dir); > > + ret = dma_mapping_error(dma_dev, dma_addr); > > + if (ret) > > + goto unlock; > > + > > + ret = dmaengine_slave_config(chan, &sconf); > > + if (ret) { > > + dev_err(dev, "Failed to configure DMA channel\n"); > > + goto unmap; > > + } > > + > > + desc = dmaengine_prep_slave_single(chan, dma_addr, seg->length, > > + sconf.direction, DMA_CTRL_ACK); > > + if (!desc) { > > + dev_err(dev, "Failed to prepare DMA\n"); > > + ret = -EIO; > > + goto unmap; > > + } > > + > > + cookie = dmaengine_submit(desc); > > + ret = dma_submit_error(cookie); > > + if (ret) { > > + dev_err(dev, "DMA submit failed %d\n", ret); > > + goto unmap; > > + } > > + > > + if (dma_sync_wait(chan, cookie) != DMA_COMPLETE) { > > + dev_err(dev, "DMA transfer failed\n"); > > + ret = -EIO; > > + } > > + > > + dmaengine_terminate_sync(chan); > > + > > +unmap: > > + dma_unmap_single(dma_dev, dma_addr, seg->length, dir); > > + > > +unlock: > > + mutex_unlock(lock); > > + > > + return ret; > > +} -- ~Vinod