Hi Yogesh, Boris, On 10.12.18 11:19, Boris Brezillon wrote: > On Mon, 10 Dec 2018 09:41:51 +0000 > Yogesh Narayan Gaur <yogeshnarayan.gaur@xxxxxxx> wrote: > >>>> +/* Instead of busy looping invoke readl_poll_timeout functionality. >>>> +*/ static int fspi_readl_poll_tout(struct nxp_fspi *f, void __iomem *base, >>>> + u32 mask, u32 delay_us, >>>> + u32 timeout_us, bool condition) >>>> +{ >>>> + u32 reg; >>>> + >>>> + if (!f->devtype_data->little_endian) >>>> + mask = (u32)cpu_to_be32(mask); >>>> + >>>> + if (condition) >>>> + return readl_poll_timeout(base, reg, (reg & mask), >>>> + delay_us, timeout_us); >>>> + else >>>> + return readl_poll_timeout(base, reg, !(reg & mask), >>>> + delay_us, timeout_us); >>> >>> I would rather use a local variable to store the condition: >>> >>> bool c = condition ? (reg & mask):!(reg & mask); >>> >> With these type of usage getting below warning messages. >> >> drivers/spi/spi-nxp-fspi.c: In function ‘fspi_readl_poll_tout.isra.10.constprop’: >> drivers/spi/spi-nxp-fspi.c:446:21: warning: ‘reg’ may be used uninitialized in this function [-Wmaybe-uninitialized] >> bool cn = c ? (reg & mask) : !(reg & mask); >> >> If assign value to reg = 0xffffffff then timeout is start getting hit for False case and if assign value 0 then start getting timeout hit for true case. >> >> I would rather not try to modify this function. > > I agree. Let's keep this function readable even if this implies > duplicating a few lines of code. My bad. This doesn't work of course. We need to pass the actual expression containing reg to the readl_poll_timeout() macro. So forget about my comment. > >> >>> return readl_poll_timeout(base, reg, c, delay_us, timeout_us); >>> >>>> +} >>>> + >>>> +/* >>>> + * If the slave device content being changed by Write/Erase, need to >>>> + * invalidate the AHB buffer. This can be achieved by doing the reset >>>> + * of controller after setting MCR0[SWRESET] bit. >>>> + */ >>>> +static inline void nxp_fspi_invalid(struct nxp_fspi *f) { >>>> + u32 reg; >>>> + int ret; >>>> + >>>> + reg = fspi_readl(f, f->iobase + FSPI_MCR0); >>>> + fspi_writel(f, reg | FSPI_MCR0_SWRST, f->iobase + FSPI_MCR0); >>>> + >>>> + /* w1c register, wait unit clear */ >>>> + ret = fspi_readl_poll_tout(f, f->iobase + FSPI_MCR0, >>>> + FSPI_MCR0_SWRST, 0, POLL_TOUT, false); >>>> + WARN_ON(ret); >>>> +} >>>> + >>>> +static void nxp_fspi_prepare_lut(struct nxp_fspi *f, >>>> + const struct spi_mem_op *op) >>>> +{ >>>> + void __iomem *base = f->iobase; >>>> + u32 lutval[4] = {}; >>>> + int lutidx = 1, i; >>>> + >>>> + /* cmd */ >>>> + lutval[0] |= LUT_DEF(0, LUT_CMD, LUT_PAD(op->cmd.buswidth), >>>> + op->cmd.opcode); >>>> + >>>> + /* addr bus width */ >>>> + if (op->addr.nbytes) { >>>> + u32 addrlen = 0; >>>> + >>>> + switch (op->addr.nbytes) { >>>> + case 1: >>>> + addrlen = ADDR8BIT; >>>> + break; >>>> + case 2: >>>> + addrlen = ADDR16BIT; >>>> + break; >>>> + case 3: >>>> + addrlen = ADDR24BIT; >>>> + break; >>>> + case 4: >>>> + addrlen = ADDR32BIT; >>>> + break; >>>> + default: >>>> + dev_err(f->dev, "In-correct address length\n"); >>>> + return; >>>> + } >>> >>> You don't need to validate op->addr.nbytes here, this is already done in >>> nxp_fspi_supports_op(). >> >> Yes, I need to validate op->addr.nbytes else LUT would going to be programmed for 0 addrlen. >> I have checked this on the target. > > Also agree there. Some operations have 0 address bytes. We could also > test addr.buswidth, but I'm fine with the addr.nbytes test too. The "if (op->addr.nbytes)" is needed of course, but I think the default case in the switch statement (and for other reasons the whole switch statement) is not needed and rather a check for op->addr.nbytes > 4 should be added to nxp_fspi_supports_op(). I wrongly assumed this check already exists in nxp_fspi_supports_op(). > >>>> +static void nxp_fspi_select_mem(struct nxp_fspi *f, struct spi_device >>>> +*spi) { >>>> + unsigned long rate = spi->max_speed_hz; >>>> + int ret; >>>> + uint64_t size_kb; >>>> + >>>> + /* >>>> + * Return, if previously selected slave device is same as current >>>> + * requested slave device. >>>> + */ >>>> + if (f->selected == spi->chip_select) >>>> + return; >>>> + >>>> + /* Reset FLSHxxCR0 registers */ >>>> + fspi_writel(f, 0, f->iobase + FSPI_FLSHA1CR0); >>>> + fspi_writel(f, 0, f->iobase + FSPI_FLSHA2CR0); >>>> + fspi_writel(f, 0, f->iobase + FSPI_FLSHB1CR0); >>>> + fspi_writel(f, 0, f->iobase + FSPI_FLSHB2CR0); >>>> + >>>> + /* Assign controller memory mapped space as size, KBytes, of flash. */ >>>> + size_kb = FSPI_FLSHXCR0_SZ(f->memmap_phy_size); >>> >> Above description of this function, explains the reason for using memmap_phy_size. >> This is not the arbitrary size, but the memory mapped size being assigned to the controller. >> >>> You are still using memory of arbitrary size (memmap_phy_size) for mapping the >>> flash. Why not use the same approach as in the QSPI driver and just map >>> ahb_buf_size until we implement the dirmap API? >> The approach which being used in QSPI driver didn't work here, I have tried with that. >> In QSPI driver, while preparing LUT we are assigning read/write address in the LUT preparation and have to for some unknown hack have to provide macro for LUT_MODE instead of LUT_ADDR. >> But this thing didn't work for FlexSPI. >> I discussed with HW IP owner and they suggested only to use LUT_ADDR for specifying the address length of the command i.e. 3-byte or 4-byte address command (NOR) or 1-2 byte address command for NAND. > > Actually, we would have used a LUT_ADDR too if the QSPI IP was support > ADDR instructions with a number of bytes < 3, but for some unknown > reasons it does not work. > >> >> Thus, in LUT preparation we have assigned only the base address. >> Now if I have assigned ahb_buf_size to FSPI_FLSHXXCR0 register then for read/write data beyond limit of ahb_buf_size offset I get data corruption. > > Why would you do that? We have the ->adjust_op_size() exactly for this > reason, so, if someone tries to do a spi_mem_op with data.nbytes > > ahb_buf_size you should return an error. > >> >> Thus, for generic approach have assigned FSPI_FLSHXXCR0 equal to the memory mapped size to the controller. This would also not going to depend on the number of CS present on the target. > > I kind of agree with Frieder on that one, I think it's preferable to > limit the per-read-op size to ahb_buf_size and let the upper layer > split the request in several sub-requests. On the controller side of > things, you just have to have a mapping of ahb_buf_size per-CS. If you > want to further optimize things, implement the dirmap hooks. > >> >>> You are already aligning the AHB reads for this in nxp_fspi_adjust_op_size(). >>> >> Yes, max read data size can be ahb_buf_size. Thus we need to check max read size with ahb_buf_size. > > Well, it's never a bad thing to check it twice, just in case the > spi-mem user is misusing the API. > >>>> +static void nxp_fspi_fill_txfifo(struct nxp_fspi *f, >>>> + const struct spi_mem_op *op) >>>> +{ >>>> + void __iomem *base = f->iobase; >>>> + int i, j, ret; >>>> + int size, tmp_size, wm_size; >>>> + u32 data = 0; >>>> + u32 *txbuf = (u32 *) op->data.buf.out; >>>> + >>>> + /* clear the TX FIFO. */ >>>> + fspi_writel(f, FSPI_IPTXFCR_CLR, base + FSPI_IPTXFCR); >>>> + >>>> + /* Default value of water mark level is 8 bytes. */ >>>> + wm_size = 8; >>>> + size = op->data.nbytes / wm_size; >>>> + for (i = 0; i < size; i++) { >>>> + /* Wait for TXFIFO empty */ >>>> + ret = fspi_readl_poll_tout(f, f->iobase + FSPI_INTR, >>>> + FSPI_INTR_IPTXWE, 0, >>>> + POLL_TOUT, true); >>>> + WARN_ON(ret); >>>> + >>>> + j = 0; >>>> + tmp_size = wm_size; >>>> + while (tmp_size > 0) { >>>> + data = 0; >>>> + memcpy(&data, txbuf, 4); >>>> + fspi_writel(f, data, base + FSPI_TFDR + j * 4); >>>> + tmp_size -= 4; >>>> + j++; >>>> + txbuf += 1; >>>> + } >>>> + fspi_writel(f, FSPI_INTR_IPTXWE, base + FSPI_INTR); >>>> + } >>>> + >>>> + size = op->data.nbytes % wm_size; >>>> + if (size) { >>>> + /* Wait for TXFIFO empty */ >>>> + ret = fspi_readl_poll_tout(f, f->iobase + FSPI_INTR, >>>> + FSPI_INTR_IPTXWE, 0, >>>> + POLL_TOUT, true); >>>> + WARN_ON(ret); >>>> + >>>> + j = 0; >>>> + tmp_size = 0; >>>> + while (size > 0) { >>>> + data = 0; >>>> + tmp_size = (size < 4) ? size : 4; >>>> + memcpy(&data, txbuf, tmp_size); >>>> + fspi_writel(f, data, base + FSPI_TFDR + j * 4); >>>> + size -= tmp_size; >>>> + j++; >>>> + txbuf += 1; >>>> + } >>>> + fspi_writel(f, FSPI_INTR_IPTXWE, base + FSPI_INTR); >>>> + } >>> >>> All these nested loops to fill the TX buffer and also the ones below to read the >>> RX buffer look much more complicated than they should really be. Can you try to >>> make this more readable? >> Yes >>> >>> Maybe something like this would work: >>> >>> for (i = 0; i < ALIGN_DOWN(op->data.nbytes, 8); i += 8) { >>> /* Wait for TXFIFO empty */ >>> ret = fspi_readl_poll_tout(f, f->iobase + FSPI_INTR, >>> FSPI_INTR_IPTXWE, 0, >>> POLL_TOUT, true); >>> >>> fspi_writel(f, op->data.buf.out + i, base + FSPI_TFDR); >>> fspi_writel(f, op->data.buf.out + i + 4, base + FSPI_TFDR + 4); >>> fspi_writel(f, FSPI_INTR_IPTXWE, base + FSPI_INTR); } >> With this above 2 lines we are hardcoding it for read/write with watermark size as 8 bytes. >> Watermark size can be variable and depends on the value of IPRXFCR/IPTXFCR register with default value as 8 bytes >> Thus, I would still prefer to use the internal for loop instead of 2 fspi_writel(...) for FSPI_TFDR and FSPI_TFDR + 4 register write commands. > > Just like you're hardcoding wm_size to 8, so I don't see a difference > here. And I indeed prefer Frieder's version. Yes, as long as the watermark level is fixed, we don't need the inner loop.