Hi Vinod, On 1/12/21 12:16 PM, Vinod Koul wrote: > On 14-12-20, 10:13, Peter Ujfalusi wrote: >> The UDMA and BCDMA can provide higher throughput if the burst_size of the >> channel is changed from it's default (which is 64 bytes) for Ultra-high >> and high capacity channels. >> >> This performance benefit is even more visible when the buffers are aligned >> with the burst_size configuration. >> >> The am654 does not have a way to change the burst size, but it is using >> 64 bytes burst, so increasing the copy_align from 8 bytes to 64 (and >> clients taking that into account) can increase the throughput as well. >> >> Numbers gathered on j721e: >> echo 8000000 > /sys/module/dmatest/parameters/test_buf_size >> echo 2000 > /sys/module/dmatest/parameters/timeout >> echo 50 > /sys/module/dmatest/parameters/iterations >> echo 1 > /sys/module/dmatest/parameters/max_channels >> >> Prior this patch: ~1.3 GB/s >> After this patch: ~1.8 GB/s >> with 1 byte alignment: ~1.7 GB/s >> >> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@xxxxxx> >> --- >> drivers/dma/ti/k3-udma.c | 115 +++++++++++++++++++++++++++++++++++++-- >> 1 file changed, 110 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/dma/ti/k3-udma.c b/drivers/dma/ti/k3-udma.c >> index 87157cbae1b8..54e4ccb1b37e 100644 >> --- a/drivers/dma/ti/k3-udma.c >> +++ b/drivers/dma/ti/k3-udma.c >> @@ -121,6 +121,11 @@ struct udma_oes_offsets { >> #define UDMA_FLAG_PDMA_ACC32 BIT(0) >> #define UDMA_FLAG_PDMA_BURST BIT(1) >> #define UDMA_FLAG_TDTYPE BIT(2) >> +#define UDMA_FLAG_BURST_SIZE BIT(3) >> +#define UDMA_FLAGS_J7_CLASS (UDMA_FLAG_PDMA_ACC32 | \ >> + UDMA_FLAG_PDMA_BURST | \ >> + UDMA_FLAG_TDTYPE | \ >> + UDMA_FLAG_BURST_SIZE) >> >> struct udma_match_data { >> enum k3_dma_type type; >> @@ -128,6 +133,7 @@ struct udma_match_data { >> bool enable_memcpy_support; >> u32 flags; >> u32 statictr_z_mask; >> + u8 burst_size[3]; >> }; >> >> struct udma_soc_data { >> @@ -436,6 +442,18 @@ static void k3_configure_chan_coherency(struct dma_chan *chan, u32 asel) >> } >> } >> >> +static u8 udma_get_chan_tpl_index(struct udma_tpl *tpl_map, int chan_id) >> +{ >> + int i; >> + >> + for (i = 0; i < tpl_map->levels; i++) { >> + if (chan_id >= tpl_map->start_idx[i]) >> + return i; >> + } > > Braces seem not required True, they are not strictly needed but I prefer to have them when I have any condition in the loop. >> + >> + return 0; >> +} >> + >> static void udma_reset_uchan(struct udma_chan *uc) >> { >> memset(&uc->config, 0, sizeof(uc->config)); >> @@ -1811,6 +1829,7 @@ static int udma_tisci_m2m_channel_config(struct udma_chan *uc) >> const struct ti_sci_rm_udmap_ops *tisci_ops = tisci_rm->tisci_udmap_ops; >> struct udma_tchan *tchan = uc->tchan; >> struct udma_rchan *rchan = uc->rchan; >> + u8 burst_size = 0; >> int ret = 0; >> >> /* Non synchronized - mem to mem type of transfer */ >> @@ -1818,6 +1837,12 @@ static int udma_tisci_m2m_channel_config(struct udma_chan *uc) >> struct ti_sci_msg_rm_udmap_tx_ch_cfg req_tx = { 0 }; >> struct ti_sci_msg_rm_udmap_rx_ch_cfg req_rx = { 0 }; >> >> + if (ud->match_data->flags & UDMA_FLAG_BURST_SIZE) { >> + u8 tpl = udma_get_chan_tpl_index(&ud->tchan_tpl, tchan->id); > > Can we define variable at function start please The 'tpl' is only used within this if branch, it looks a bit cleaner imho, but if you insist, I can move the definition. ... >> +static enum dmaengine_alignment udma_get_copy_align(struct udma_dev *ud) >> +{ >> + const struct udma_match_data *match_data = ud->match_data; >> + u8 tpl; >> + >> + if (!match_data->enable_memcpy_support) >> + return DMAENGINE_ALIGN_8_BYTES; >> + >> + /* Get the highest TPL level the device supports for memcpy */ >> + if (ud->bchan_cnt) { >> + tpl = udma_get_chan_tpl_index(&ud->bchan_tpl, 0); >> + } else if (ud->tchan_cnt) { >> + tpl = udma_get_chan_tpl_index(&ud->tchan_tpl, 0); >> + } else { >> + return DMAENGINE_ALIGN_8_BYTES; >> + } > > Braces seem not required Very true. > >> + >> + switch (match_data->burst_size[tpl]) { >> + case TI_SCI_RM_UDMAP_CHAN_BURST_SIZE_256_BYTES: >> + return DMAENGINE_ALIGN_256_BYTES; >> + case TI_SCI_RM_UDMAP_CHAN_BURST_SIZE_128_BYTES: >> + return DMAENGINE_ALIGN_128_BYTES; >> + case TI_SCI_RM_UDMAP_CHAN_BURST_SIZE_64_BYTES: >> + fallthrough; >> + default: >> + return DMAENGINE_ALIGN_64_BYTES; > > ah, we are supposed to have case at same indent as switch, pls run > checkpatch to have these flagged off Yes, they should be. The other me did a sloppy job for sure, this should have been screaming even without checkpatch... This has been done in a rush during the last days to close on the backlog item which got the most votes. -- Péter