I trimmed the quote to make it easier to follow. I intentionally skipped some style comments from Vinod in this reply. Hopefully I didn't miss anything. > > +#define SWITCHTEC_DMA_SQ_SIZE SZ_32K > > +#define SWITCHTEC_DMA_CQ_SIZE SZ_32K > > + > > +#define SWITCHTEC_DMA_RING_SIZE SWITCHTEC_DMA_SQ_SIZE > > + > > +static int wait_for_chan_status(struct chan_hw_regs __iomem > > *chan_hw, u32 mask, > > + bool set) > > single line would read much better! (here and few other places) > > Also run checkpatch with --strict option, that will help catch style > inconsistencies (i am seeing some below) Ok. > > > +{ > > + u32 status; > > + int retry = 100; > > + int ret = -EIO; > > + > > + do { > > + status = readl(&chan_hw->status); > > + if ((set && (status & mask)) || > > + (!set && !(status & mask))) { > > + ret = 0; > > + break; > > + } > > why not use readl_poll_timeout() Good point. Will use readl_poll_timeout_atomic(). > > > + > > +static int disable_channel(struct switchtec_dma_chan *swdma_chan) > > +{ > > + u32 valid_en_se; > > + struct chan_fw_regs __iomem *chan_fw = swdma_chan- > > >mmio_chan_fw; > > + struct pci_dev *pdev; > > + > > + rcu_read_lock(); > > + pdev = rcu_dereference(swdma_chan->swdma_dev->pdev); > > + if (!pdev) { > > + rcu_read_unlock(); > > + return -ENODEV; > > + } > > + > > + valid_en_se = readl(&chan_fw->valid_en_se); > > + valid_en_se &= ~SWITCHTEC_CHAN_ENABLE; > > + > > + writel(valid_en_se, &chan_fw->valid_en_se); > > + > > + rcu_read_unlock(); > > + return 0; > > only diff b/w enable/disable is bit ops, I guess we can have a common > fn > with argument for enable/disable I'll make these two functions wrappers to another function which takes enable/disable flag as parameter. > > > +static struct dma_async_tx_descriptor > > *switchtec_dma_prep_wimm_data( > > + struct dma_chan *c, dma_addr_t dma_dst, u64 data, > > + unsigned long flags) > > + __acquires(swdma_chan->submit_lock) > > +{ > > can you please explain what is meant by wimm data and how it us used? Sure. Generally it's just an implementation for the device_prep_dma_imm_data() callback. I think it's an alternative method to prep_mem and would be more convenient to use when the write is 8- byte and the data to be moved is not in a DMA mapped memory location. For example, we write to a doorbell register with the value from a local variable which is not associated with a DMA address. > > > > > + > > +static dma_cookie_t switchtec_dma_tx_submit( > > + struct dma_async_tx_descriptor *desc) > > + __releases(swdma_chan->submit_lock) > > +{ > > + struct switchtec_dma_chan *swdma_chan = > > + to_switchtec_dma_chan(desc->chan); > > + dma_cookie_t cookie; > > + > > + cookie = dma_cookie_assign(desc); > > + > > + spin_unlock_bh(&swdma_chan->submit_lock); > > I dont this is correct, why would you prevent users from preparing > multiple descriptors..? The driver implements the SQ with a pre-allocated buffer which means the descriptor we are preparing has a determined slot in the queue before we submitting it. To align with the way how cookies complete, I have to make sure that the descriptors are prepared and submitted in order. I think it's also the way some other DMA drivers work, like ioat, plx. > > > + size = SWITCHTEC_DMA_SQ_SIZE * sizeof(*swdma_chan->hw_sq); > > + swdma_chan->hw_sq = dma_alloc_coherent(swdma_dev- > > >dma_dev.dev, size, > > + &swdma_chan- > > >dma_addr_sq, > > + GFP_KERNEL); > > this can be called from atomic context, GFP_NOWAIT pls Sure. > > > > > + /* set sq/cq */ > > + writel(lower_32_bits(swdma_chan->dma_addr_sq), &chan_fw- > > >sq_base_lo); > > + writel(upper_32_bits(swdma_chan->dma_addr_sq), &chan_fw- > > >sq_base_hi); > > + writel(lower_32_bits(swdma_chan->dma_addr_cq), &chan_fw- > > >cq_base_lo); > > + writel(upper_32_bits(swdma_chan->dma_addr_cq), &chan_fw- > > >cq_base_hi); > > + > > + writew(SWITCHTEC_DMA_SQ_SIZE, &swdma_chan->mmio_chan_fw- > > >sq_size); > > + writew(SWITCHTEC_DMA_CQ_SIZE, &swdma_chan->mmio_chan_fw- > > >cq_size); > > why is hardware being written in alloc call, the job is to prepare > descritor, hw write should happen when we submit the descriptors.. The descriptors in the queue are pre-allocated so I think it's proper to convey the info to hardware at this point. And in submit only the cookie is assigned. > > > + rc = dma_async_device_register(dma); > > + if (rc) { > > + pci_err(pdev, "Failed to register dma device: %d\n", > > rc); > > + goto err_chans_release_exit; > > + } > > + > > + pci_info(pdev, "Channel count: %d\n", chan_cnt); > > dev_info()? Any specific reason for switching to dev_info()? Thanks, Kelvin