On Mon, Mar 3, 2014 at 1:25 AM, <xuelin.shi@xxxxxxxxxxxxx> wrote: > From: Xuelin Shi <xuelin.shi@xxxxxxxxxxxxx> > > The RaidEngine is a new FSL hardware used for Raid5/6 acceration. > > This patch enables the RaidEngine functionality and provides > hardware offloading capability for memcpy, xor and pq computation. > It works with async_tx. > > Signed-off-by: Harninder Rai <harninder.rai@xxxxxxxxxxxxx> > Signed-off-by: Naveen Burmi <naveenburmi@xxxxxxxxxxxxx> > Signed-off-by: Xuelin Shi <xuelin.shi@xxxxxxxxxxxxx> > --- > changes for v2: > - remove ASYNC_TX_ENABLE_CHANNEL_SWITCH > - change tasklet to threaded irq > - change resource allocation to devm_xxx > > drivers/dma/Kconfig | 11 + > drivers/dma/Makefile | 1 + > drivers/dma/fsl_raid.c | 894 +++++++++++++++++++++++++++++++++++++++++++++++++ > drivers/dma/fsl_raid.h | 310 +++++++++++++++++ > 4 files changed, 1216 insertions(+) > create mode 100644 drivers/dma/fsl_raid.c > create mode 100644 drivers/dma/fsl_raid.h > > diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig > index 605b016..829f41c 100644 > --- a/drivers/dma/Kconfig > +++ b/drivers/dma/Kconfig > @@ -100,6 +100,17 @@ config FSL_DMA > EloPlus is on mpc85xx and mpc86xx and Pxxx parts, and the Elo3 is on > some Txxx and Bxxx parts. > > +config FSL_RAID > + tristate "Freescale RAID engine Support" > + depends on FSL_SOC && !FSL_DMA > + select DMA_ENGINE > + select DMA_ENGINE_RAID > + ---help--- > + Enable support for Freescale RAID Engine. RAID Engine is > + available on some QorIQ SoCs (like P5020). It has > + the capability to offload memcpy, xor and pq computation > + for raid5/6. > + > config MPC512X_DMA > tristate "Freescale MPC512x built-in DMA engine support" > depends on PPC_MPC512x || PPC_MPC831x > diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile > index a029d0f4..60b163b 100644 > --- a/drivers/dma/Makefile > +++ b/drivers/dma/Makefile > @@ -44,3 +44,4 @@ obj-$(CONFIG_DMA_JZ4740) += dma-jz4740.o > obj-$(CONFIG_TI_CPPI41) += cppi41.o > obj-$(CONFIG_K3_DMA) += k3dma.o > obj-$(CONFIG_MOXART_DMA) += moxart-dma.o > +obj-$(CONFIG_FSL_RAID) += fsl_raid.o > diff --git a/drivers/dma/fsl_raid.c b/drivers/dma/fsl_raid.c > new file mode 100644 > index 0000000..7f153aa > --- /dev/null > +++ b/drivers/dma/fsl_raid.c > @@ -0,0 +1,894 @@ > +/* > + * drivers/dma/fsl_raid.c > + * > + * Freescale RAID Engine device driver > + * > + * Author: > + * Harninder Rai <harninder.rai@xxxxxxxxxxxxx> > + * Naveen Burmi <naveenburmi@xxxxxxxxxxxxx> > + * > + * Rewrite: > + * Xuelin Shi <xuelin.shi@xxxxxxxxxxxxx> > + * > + * Copyright (c) 2010-2014 Freescale Semiconductor, Inc. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions are met: > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in the > + * documentation and/or other materials provided with the distribution. > + * * Neither the name of Freescale Semiconductor nor the > + * names of its contributors may be used to endorse or promote products > + * derived from this software without specific prior written permission. > + * > + * ALTERNATIVELY, this software may be distributed under the terms of the > + * GNU General Public License ("GPL") as published by the Free Software > + * Foundation, either version 2 of that License or (at your option) any > + * later version. > + * > + * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY > + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED > + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE > + * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY > + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES > + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; > + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND > + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS > + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + * > + * Theory of operation: > + * > + * General capabilities: > + * RAID Engine (RE) block is capable of offloading XOR, memcpy and P/Q > + * calculations required in RAID5 and RAID6 operations. RE driver > + * registers with Linux's ASYNC layer as dma driver. RE hardware > + * maintains strict ordering of the requests through chained > + * command queueing. > + * > + * Data flow: > + * Software RAID layer of Linux (MD layer) maintains RAID partitions, > + * strips, stripes etc. It sends requests to the underlying AYSNC layer > + * which further passes it to RE driver. ASYNC layer decides which request > + * goes to which job ring of RE hardware. For every request processed by > + * RAID Engine, driver gets an interrupt unless coalescing is set. The > + * per job ring interrupt handler checks the status register for errors, > + * clears the interrupt and schedules a tasklet. Main request processing > + * is done in tasklet. A software shadow copy of the HW ring is kept to > + * maintain virtual to physical translation. Based on the internal indexes > + * maintained, the tasklet picks the descriptor address from shadow copy, > + * updates the corresponding cookie, updates the outbound ring job removed > + * register in RE hardware and eventually calls the callback function. This > + * callback function gets passed as part of request from MD layer. > + */ > + > +#include <linux/interrupt.h> > +#include <linux/module.h> > +#include <linux/of_irq.h> > +#include <linux/of_address.h> > +#include <linux/of_platform.h> > +#include <linux/dma-mapping.h> > +#include <linux/dmapool.h> > +#include <linux/dmaengine.h> > +#include <linux/io.h> > +#include <linux/spinlock.h> > +#include <linux/slab.h> > + > +#include "dmaengine.h" > +#include "fsl_raid.h" > + > +#define MAX_XOR_SRCS 16 > +#define MAX_PQ_SRCS 16 > +#define MAX_INITIAL_DESCS 256 > +#define MAX_DESCS_LIMIT (MAX_INITIAL_DESCS * 4) > +#define FRAME_FORMAT 0x1 > +#define MAX_DATA_LENGTH (1024*1024) > + > +#define to_fsl_re_dma_desc(tx) container_of(tx, \ > + struct fsl_re_dma_async_tx_desc, async_tx) > + > +/* Add descriptors into per jr software queue - submit_q */ > +static dma_cookie_t re_jr_tx_submit(struct dma_async_tx_descriptor *tx) > +{ > + struct fsl_re_dma_async_tx_desc *desc; > + struct re_jr *jr; > + dma_cookie_t cookie; > + unsigned long flags; > + > + desc = to_fsl_re_dma_desc(tx); > + jr = container_of(tx->chan, struct re_jr, chan); > + > + spin_lock_irqsave(&jr->desc_lock, flags); > + cookie = dma_cookie_assign(tx); > + list_add_tail(&desc->node, &jr->submit_q); > + spin_unlock_irqrestore(&jr->desc_lock, flags); > + > + return cookie; > +} > + > +static void re_jr_desc_done(struct fsl_re_dma_async_tx_desc *desc) > +{ > + struct dma_chan *chan = &desc->jr->chan; > + dma_async_tx_callback callback; > + void *callback_param; > + unsigned long flags; > + > + spin_lock_irqsave(&desc->jr->desc_lock, flags); > + if (chan->completed_cookie < desc->async_tx.cookie) { > + chan->completed_cookie = desc->async_tx.cookie; > + if (chan->completed_cookie == DMA_MAX_COOKIE) > + chan->completed_cookie = DMA_MIN_COOKIE; > + } It's not clear to me why you need to roll over completed cookie to DMA_MIN_COOKIE? It will happen naturally when the cookie rolls over. > + spin_unlock_irqrestore(&desc->jr->desc_lock, flags); > + > + callback = desc->async_tx.callback; > + callback_param = desc->async_tx.callback_param; > + > + if (callback) > + callback(callback_param); > + > + dma_descriptor_unmap(&desc->async_tx); > + > + dma_run_dependencies(&desc->async_tx); You can delete this line, since channel switching is disabled, there can't be any dependencies. > +} > + > +static void re_jr_cleanup_descs(struct re_jr *jr) > +{ > + struct fsl_re_dma_async_tx_desc *desc, *_desc; > + unsigned long flags; > + > + list_for_each_entry_safe(desc, _desc, &jr->ack_q, node) { > + spin_lock_irqsave(&jr->desc_lock, flags); > + if (async_tx_test_ack(&desc->async_tx)) > + list_move_tail(&desc->node, &jr->free_q); > + spin_unlock_irqrestore(&jr->desc_lock, flags); > + } > +} > + > +static irqreturn_t re_jr_isr_thread(int this_irq, void *data) > +{ > + struct re_jr *jr; > + struct fsl_re_dma_async_tx_desc *desc, *_desc; > + struct jr_hw_desc *hwdesc; > + unsigned long flags; > + unsigned int count; > + u32 sw_high, done_high; > + > + jr = (struct re_jr *)data; > + > + spin_lock_bh(&jr->oub_lock); > + count = RE_JR_OUB_SLOT_FULL(in_be32(&jr->jrregs->oubring_slot_full)); > + while (count--) { > + hwdesc = &jr->oub_ring_virt_addr[jr->oub_count]; > + list_for_each_entry_safe(desc, _desc, &jr->active_q, node) { > + /* compare the hw dma addr to find the completed */ > + sw_high = desc->hwdesc.lbea32 & HWDESC_ADDR_HIGH_MASK; > + done_high = hwdesc->lbea32 & HWDESC_ADDR_HIGH_MASK; > + if (sw_high == done_high && > + desc->hwdesc.addr_low == hwdesc->addr_low) > + break; > + } > + > + re_jr_desc_done(desc); > + > + jr->oub_count = (jr->oub_count + 1) & RING_SIZE_MASK; > + > + out_be32(&jr->jrregs->oubring_job_rmvd, > + RE_JR_OUB_JOB_REMOVE(1)); > + > + spin_lock_irqsave(&jr->desc_lock, flags); > + list_move_tail(&desc->node, &jr->ack_q); > + spin_unlock_irqrestore(&jr->desc_lock, flags); > + } > + spin_unlock_bh(&jr->oub_lock); > + > + re_jr_cleanup_descs(jr); > + > + return IRQ_HANDLED; > +} > + > +/* Per Job Ring interrupt handler */ > +static irqreturn_t re_jr_isr(int irq, void *data) > +{ > + struct re_jr *jr = (struct re_jr *)data; > + > + u32 irqstate, status; > + irqstate = in_be32(&jr->jrregs->jr_interrupt_status); > + if (!irqstate) > + return IRQ_NONE; > + > + /* > + * There's no way in upper layer (read MD layer) to recover from > + * error conditions except restart everything. In long term we > + * need to do something more than just crashing > + */ > + if (irqstate & RE_JR_ERROR) { > + status = in_be32(&jr->jrregs->jr_status); > + dev_err(jr->dev, "jr error irqstate: %x, status: %x\n", > + irqstate, status); > + } > + > + /* Clear interrupt */ > + out_be32(&jr->jrregs->jr_interrupt_status, RE_JR_CLEAR_INT); > + > + return IRQ_WAKE_THREAD; > +} > + > +static enum dma_status re_jr_tx_status(struct dma_chan *chan, > + dma_cookie_t cookie, struct dma_tx_state *txstate) > +{ > + enum dma_status ret; > + struct re_jr *jr = container_of(chan, struct re_jr, chan); > + > + ret = dma_cookie_status(chan, cookie, txstate); > + if (ret != DMA_COMPLETE) { > + re_jr_cleanup_descs(jr); > + ret = dma_cookie_status(chan, cookie, txstate); > + } > + > + return ret; > +} > + > +/* Copy descriptor from per jr software queue into hardware job ring */ > +void re_jr_issue_pending(struct dma_chan *chan) > +{ > + struct re_jr *jr; > + int avail; > + struct fsl_re_dma_async_tx_desc *desc, *_desc; > + unsigned long flags; > + > + jr = container_of(chan, struct re_jr, chan); > + > + if (list_empty(&jr->submit_q)) > + return; > + > + avail = RE_JR_INB_SLOT_AVAIL(in_be32(&jr->jrregs->inbring_slot_avail)); > + if (!avail) > + return; Given that we silently don't issue when the ring is full, should the interrupt handler check submit_q after it has freed up some space? > + > + spin_lock_irqsave(&jr->desc_lock, flags); > + > + list_for_each_entry_safe(desc, _desc, &jr->submit_q, node) { > + if (!avail) > + break; > + > + list_move_tail(&desc->node, &jr->active_q); > + > + memcpy(&jr->inb_ring_virt_addr[jr->inb_count], &desc->hwdesc, > + sizeof(struct jr_hw_desc)); > + > + jr->inb_count = (jr->inb_count + 1) & RING_SIZE_MASK; > + > + /* add one job into job ring */ > + out_be32(&jr->jrregs->inbring_add_job, RE_JR_INB_JOB_ADD(1)); > + avail--; > + } > + > + spin_unlock_irqrestore(&jr->desc_lock, flags); > +} > + > +void fill_cfd_frame(struct cmpnd_frame *cf, u8 index, > + size_t length, dma_addr_t addr, bool final) > +{ > + u32 efrl = 0; > + efrl |= length & CF_LENGTH_MASK; > + efrl |= final << CF_FINAL_SHIFT; > + cf[index].efrl32 |= efrl; > + cf[index].addr_low = (u32)addr; > + cf[index].addr_high = (u32)(addr >> 32); > +} > + > +static struct fsl_re_dma_async_tx_desc *re_jr_init_desc(struct re_jr *jr, > + struct fsl_re_dma_async_tx_desc *desc, void *cf, dma_addr_t paddr) > +{ > + desc->jr = jr; > + desc->async_tx.tx_submit = re_jr_tx_submit; > + dma_async_tx_descriptor_init(&desc->async_tx, &jr->chan); > + INIT_LIST_HEAD(&desc->node); > + > + desc->hwdesc.fmt32 = FRAME_FORMAT << HWDESC_FMT_SHIFT; > + desc->hwdesc.lbea32 = (paddr >> 32) & HWDESC_ADDR_HIGH_MASK; > + desc->hwdesc.addr_low = (u32)paddr; > + desc->cf_addr = cf; > + desc->cf_paddr = paddr; > + > + desc->cdb_addr = (void *)(cf + RE_CF_DESC_SIZE); > + desc->cdb_paddr = paddr + RE_CF_DESC_SIZE; > + > + return desc; > +} > + > +static struct fsl_re_dma_async_tx_desc *re_jr_alloc_desc(struct re_jr *jr, > + unsigned long flags) > +{ > + struct fsl_re_dma_async_tx_desc *desc; > + void *cf; > + dma_addr_t paddr; > + unsigned long lock_flag; > + > + if (list_empty(&jr->free_q)) { > + desc = kzalloc(sizeof(*desc), GFP_KERNEL); > + cf = dma_pool_alloc(jr->re_dev->cf_desc_pool, GFP_ATOMIC, > + &paddr); This does not make sense, how can you have an GFP_KERNEL and a GFP_ATOMIC allocation in the same context? Since you allocate these in the submission path you can't use GFP_KERNEL. These should both be GFP_NOWAIT. > + if (!desc || !cf) { > + kfree(desc); > + return NULL; > + } > + desc = re_jr_init_desc(jr, desc, cf, paddr); > + > + spin_lock_irqsave(&jr->desc_lock, lock_flag); > + list_add(&desc->node, &jr->free_q); > + jr->alloc_count++; > + spin_unlock_irqrestore(&jr->desc_lock, lock_flag); > + } > + > + spin_lock_irqsave(&jr->desc_lock, lock_flag); > + desc = list_first_entry(&jr->free_q, > + struct fsl_re_dma_async_tx_desc, node); > + list_del(&desc->node); > + spin_unlock_irqrestore(&jr->desc_lock, lock_flag); > + > + desc->async_tx.flags = flags; > + return desc; > +} > + > +static struct dma_async_tx_descriptor *re_jr_prep_genq( > + struct dma_chan *chan, dma_addr_t dest, dma_addr_t *src, > + unsigned int src_cnt, const unsigned char *scf, size_t len, > + unsigned long flags) > +{ > + struct re_jr *jr; > + struct fsl_re_dma_async_tx_desc *desc; > + struct xor_cdb *xor; > + struct cmpnd_frame *cf; > + u32 cdb; > + unsigned int i, j; > + > + if (len > MAX_DATA_LENGTH) { > + pr_err("Length greater than %d not supported\n", > + MAX_DATA_LENGTH); > + return NULL; > + } > + > + jr = container_of(chan, struct re_jr, chan); > + desc = re_jr_alloc_desc(jr, flags); > + if (desc <= 0) > + return NULL; > + > + /* Filling xor CDB */ > + cdb = RE_XOR_OPCODE << RE_CDB_OPCODE_SHIFT; > + cdb |= (src_cnt - 1) << RE_CDB_NRCS_SHIFT; > + cdb |= RE_BLOCK_SIZE << RE_CDB_BLKSIZE_SHIFT; > + cdb |= INTERRUPT_ON_ERROR << RE_CDB_ERROR_SHIFT; > + cdb |= DATA_DEPENDENCY << RE_CDB_DEPEND_SHIFT; > + xor = desc->cdb_addr; > + xor->cdb32 = cdb; > + > + if (scf != NULL) { > + /* compute q = src0*coef0^src1*coef1^..., * is GF(8) mult */ > + for (i = 0; i < src_cnt; i++) > + xor->gfm[i] = scf[i]; > + } else { > + /* compute P, that is XOR all srcs */ > + for (i = 0; i < src_cnt; i++) > + xor->gfm[i] = 1; > + } > + > + /* Filling frame 0 of compound frame descriptor with CDB */ > + cf = desc->cf_addr; > + fill_cfd_frame(cf, 0, sizeof(struct xor_cdb), desc->cdb_paddr, 0); > + > + /* Fill CFD's 1st frame with dest buffer */ > + fill_cfd_frame(cf, 1, len, dest, 0); > + > + /* Fill CFD's rest of the frames with source buffers */ > + for (i = 2, j = 0; j < src_cnt; i++, j++) > + fill_cfd_frame(cf, i, len, src[j], 0); > + > + /* Setting the final bit in the last source buffer frame in CFD */ > + cf[i - 1].efrl32 |= 1 << CF_FINAL_SHIFT; > + > + return &desc->async_tx; > +} > + > +/* > + * Prep function for P parity calculation.In RAID Engine terminology, > + * XOR calculation is called GenQ calculation done through GenQ command > + */ > +static struct dma_async_tx_descriptor *re_jr_prep_dma_xor( > + struct dma_chan *chan, dma_addr_t dest, dma_addr_t *src, > + unsigned int src_cnt, size_t len, unsigned long flags) > +{ > + /* NULL let genq take all coef as 1 */ > + return re_jr_prep_genq(chan, dest, src, src_cnt, NULL, len, flags); > +} > + > +/* > + * Prep function for P/Q parity calculation.In RAID Engine terminology, > + * P/Q calculation is called GenQQ done through GenQQ command > + */ > +static struct dma_async_tx_descriptor *re_jr_prep_pq( > + struct dma_chan *chan, dma_addr_t *dest, dma_addr_t *src, > + unsigned int src_cnt, const unsigned char *scf, size_t len, > + unsigned long flags) > +{ > + struct re_jr *jr; > + struct fsl_re_dma_async_tx_desc *desc; > + struct pq_cdb *pq; > + struct cmpnd_frame *cf; > + u32 cdb; > + u8 *p; > + int gfmq_len, i, j; > + > + if (len > MAX_DATA_LENGTH) { > + pr_err("Length greater than %d not supported\n", > + MAX_DATA_LENGTH); > + return NULL; > + } > + > + /* > + * RE requires at least 2 sources, if given only one source, we pass the > + * second source same as the first one. > + * With only one source, generating P is meaningless, only generate Q. > + */ > + if (src_cnt == 1) { > + struct dma_async_tx_descriptor *tx; > + dma_addr_t dma_src[2]; > + unsigned char coef[2]; > + > + dma_src[0] = *src; > + coef[0] = *scf; > + dma_src[1] = *src; > + coef[1] = 0; > + tx = re_jr_prep_genq(chan, dest[1], dma_src, 2, coef, len, > + flags); > + if (tx) > + desc = to_fsl_re_dma_desc(tx); > + > + return tx; > + } > + > + /* > + * During RAID6 array creation, Linux's MD layer gets P and Q > + * calculated separately in two steps. But our RAID Engine has > + * the capability to calculate both P and Q with a single command > + * Hence to merge well with MD layer, we need to provide a hook > + * here and call re_jq_prep_genq() function > + */ > + > + if (flags & DMA_PREP_PQ_DISABLE_P) > + return re_jr_prep_genq(chan, dest[1], src, src_cnt, > + scf, len, flags); > + > + jr = container_of(chan, struct re_jr, chan); > + desc = re_jr_alloc_desc(jr, flags); > + if (desc <= 0) > + return NULL; > + > + /* Filling GenQQ CDB */ > + cdb = RE_PQ_OPCODE << RE_CDB_OPCODE_SHIFT; > + cdb |= (src_cnt - 1) << RE_CDB_NRCS_SHIFT; > + cdb |= RE_BLOCK_SIZE << RE_CDB_BLKSIZE_SHIFT; > + cdb |= BUFFERABLE_OUTPUT << RE_CDB_BUFFER_SHIFT; > + cdb |= DATA_DEPENDENCY << RE_CDB_DEPEND_SHIFT; > + > + pq = desc->cdb_addr; > + pq->cdb32 = cdb; > + > + p = pq->gfm_q1; > + /* Init gfm_q1[] */ > + for (i = 0; i < src_cnt; i++) > + p[i] = 1; > + > + /* Align gfm[] to 32bit */ > + gfmq_len = ALIGN(src_cnt, 4); > + > + /* Init gfm_q2[] */ > + p += gfmq_len; > + for (i = 0; i < src_cnt; i++) > + p[i] = scf[i]; > + > + /* Filling frame 0 of compound frame descriptor with CDB */ > + cf = desc->cf_addr; > + fill_cfd_frame(cf, 0, sizeof(struct pq_cdb), desc->cdb_paddr, 0); > + > + /* Fill CFD's 1st & 2nd frame with dest buffers */ > + for (i = 1, j = 0; i < 3; i++, j++) > + fill_cfd_frame(cf, i, len, dest[j], 0); > + > + /* Fill CFD's rest of the frames with source buffers */ > + for (i = 3, j = 0; j < src_cnt; i++, j++) > + fill_cfd_frame(cf, i, len, src[j], 0); > + > + /* Setting the final bit in the last source buffer frame in CFD */ > + cf[i - 1].efrl32 |= 1 << CF_FINAL_SHIFT; > + > + return &desc->async_tx; > +} > + > +/* > + * Prep function for memcpy. In RAID Engine, memcpy is done through MOVE > + * command. Logic of this function will need to be modified once multipage > + * support is added in Linux's MD/ASYNC Layer > + */ > +static struct dma_async_tx_descriptor *re_jr_prep_memcpy( > + struct dma_chan *chan, dma_addr_t dest, dma_addr_t src, > + size_t len, unsigned long flags) > +{ > + struct re_jr *jr; > + struct fsl_re_dma_async_tx_desc *desc; > + size_t length; > + struct cmpnd_frame *cf; > + struct move_cdb *move; > + u32 cdb; > + > + jr = container_of(chan, struct re_jr, chan); > + > + if (len > MAX_DATA_LENGTH) { > + pr_err("Length greater than %d not supported\n", > + MAX_DATA_LENGTH); > + return NULL; > + } > + > + desc = re_jr_alloc_desc(jr, flags); > + if (desc <= 0) > + return NULL; > + > + /* Filling move CDB */ > + cdb = RE_MOVE_OPCODE << RE_CDB_OPCODE_SHIFT; > + cdb |= RE_BLOCK_SIZE << RE_CDB_BLKSIZE_SHIFT; > + cdb |= INTERRUPT_ON_ERROR << RE_CDB_ERROR_SHIFT; > + cdb |= DATA_DEPENDENCY << RE_CDB_DEPEND_SHIFT; > + > + move = desc->cdb_addr; > + move->cdb32 = cdb; > + > + /* Filling frame 0 of CFD with move CDB */ > + cf = desc->cf_addr; > + fill_cfd_frame(cf, 0, sizeof(struct move_cdb), desc->cdb_paddr, 0); > + > + length = min_t(size_t, len, MAX_DATA_LENGTH); > + > + /* Fill CFD's 1st frame with dest buffer */ > + fill_cfd_frame(cf, 1, length, dest, 0); > + > + /* Fill CFD's 2nd frame with src buffer */ > + fill_cfd_frame(cf, 2, length, src, 1); > + > + return &desc->async_tx; > +} > + > +static int re_jr_alloc_chan_resources(struct dma_chan *chan) > +{ > + struct re_jr *jr = container_of(chan, struct re_jr, chan); > + struct fsl_re_dma_async_tx_desc *desc; > + void *cf; > + dma_addr_t paddr; > + > + int i; > + > + for (i = 0; i < MAX_INITIAL_DESCS; i++) { > + desc = kzalloc(sizeof(*desc), GFP_KERNEL); > + cf = dma_pool_alloc(jr->re_dev->cf_desc_pool, GFP_ATOMIC, > + &paddr); GFP_KERNEL is ok here for both. > + if (!desc || !cf) { > + kfree(desc); > + break; > + } > + > + INIT_LIST_HEAD(&desc->node); > + re_jr_init_desc(jr, desc, cf, paddr); > + > + list_add_tail(&desc->node, &jr->free_q); > + jr->alloc_count++; > + } > + return jr->alloc_count; > +} > + > +static void re_jr_free_chan_resources(struct dma_chan *chan) > +{ > + struct re_jr *jr = container_of(chan, struct re_jr, chan); > + struct fsl_re_dma_async_tx_desc *desc; > + > + while (jr->alloc_count--) { > + desc = list_first_entry(&jr->free_q, > + struct fsl_re_dma_async_tx_desc, > + node); > + > + list_del(&desc->node); > + dma_pool_free(jr->re_dev->cf_desc_pool, desc->cf_addr, > + desc->cf_paddr); > + kfree(desc); > + } > + > + BUG_ON(!list_empty(&jr->free_q)); > +} > + > +int re_jr_probe(struct platform_device *ofdev, > + struct device_node *np, u8 q, u32 off) > +{ > + struct device *dev; > + struct re_drv_private *repriv; > + struct re_jr *jr; > + struct dma_device *dma_dev; > + u32 ptr; > + u32 status; > + int ret = 0, rc; > + struct platform_device *jr_ofdev; > + > + dev = &ofdev->dev; > + repriv = dev_get_drvdata(dev); > + dma_dev = &repriv->dma_dev; > + > + jr = devm_kzalloc(dev, sizeof(*jr), GFP_KERNEL); > + if (!jr) { > + dev_err(dev, "No free memory for allocating JR struct\n"); > + return -ENOMEM; > + } > + > + /* create platform device for jr node */ > + jr_ofdev = of_platform_device_create(np, NULL, dev); > + if (jr_ofdev == NULL) { > + dev_err(dev, "Not able to create ofdev for jr %d\n", q); > + ret = -EINVAL; > + goto err_free; > + } > + dev_set_drvdata(&jr_ofdev->dev, jr); > + > + /* read reg property from dts */ > + rc = of_property_read_u32(np, "reg", &ptr); > + if (rc) { > + dev_err(dev, "Reg property not found in JR number %d\n", q); > + ret = -ENODEV; > + goto err_free; > + } > + > + jr->jrregs = (struct jr_config_regs *)((u8 *)repriv->re_regs + > + off + ptr); > + > + /* read irq property from dts */ > + jr->irq = irq_of_parse_and_map(np, 0); > + if (jr->irq == NO_IRQ) { > + dev_err(dev, "No IRQ defined for JR %d\n", q); > + ret = -ENODEV; > + goto err_free; > + } > + > + ret = devm_request_threaded_irq(&jr_ofdev->dev, jr->irq, re_jr_isr, > + re_jr_isr_thread, 0, jr->name, jr); > + if (ret) { > + dev_err(dev, "Unable to register JR interrupt for JR %d\n", q); > + ret = -EINVAL; > + goto err_free; > + } > + > + snprintf(jr->name, sizeof(jr->name), "re_jr%02d", q); > + > + repriv->re_jrs[q] = jr; > + jr->chan.device = dma_dev; > + jr->chan.private = jr; > + jr->dev = &jr_ofdev->dev; > + jr->re_dev = repriv; > + > + spin_lock_init(&jr->desc_lock); > + INIT_LIST_HEAD(&jr->ack_q); > + INIT_LIST_HEAD(&jr->active_q); > + INIT_LIST_HEAD(&jr->submit_q); > + INIT_LIST_HEAD(&jr->free_q); > + > + spin_lock_init(&jr->inb_lock); > + spin_lock_init(&jr->oub_lock); > + > + list_add_tail(&jr->chan.device_node, &dma_dev->channels); > + dma_dev->chancnt++; > + > + jr->inb_ring_virt_addr = dma_pool_alloc(jr->re_dev->hw_desc_pool, > + GFP_ATOMIC, &jr->inb_phys_addr); > + No need to use GFP_ATOMIC here. -- Dan -- To unsubscribe from this list: send the line "unsubscribe dmaengine" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html