Hi Tuukka, Thanks for your replies. Please see mine inline. On Tue, Jun 13, 2017 at 5:58 PM, Tuukka Toivonen <tuukka.toivonen@xxxxxxxxx> wrote: > Hi Tomasz, > > On Monday, June 12, 2017 18:59:18 Tomasz Figa wrote: >> By any chance, doesn't the hardware provide some simple mode for >> contiguous buffers? Since we have an MMU anyway, we could use >> vb2_dma_contig and simplify the code significantly. > > In IPU3 the CIO2 (CSI-2 receiver) and the IMGU (image processing system) > are entirely separate PCI devices. The MMU is only in the IMGU device; > the CIO2 doesn't have MMU but has the FBPT (frame buffer pointer tables) > to handle discontinuous buffers. > > [...] > >> >> > + pixelformat = V4L2_PIX_FMT_IPU3_SRGGB10; >> > + >> > + alloc_devs[0] = &cio2->pci_dev->dev; >> >> Hmm, so it doesn't go through the IPU MMU in the end? > > No, it doesn't. Aha. I was confused by the fact that the driver calls dma_alloc_(non)coherent() with sizes likely greater than PAGE_SIZE. So, given the above, I believe we need to fix the LOP allocation to allocate one page at a time and stop relying on bus address contiguity. >> > +/* Called after each buffer is allocated */ >> > +static int cio2_vb2_buf_init(struct vb2_buffer *vb) >> > +{ >> > + struct cio2_device *cio2 = vb2_get_drv_priv(vb->vb2_queue); >> > + struct device *dev = &cio2->pci_dev->dev; >> > + struct cio2_buffer *b = >> > + container_of(vb, struct cio2_buffer, vbb.vb2_buf); >> > + unsigned int length = vb->planes[0].length; >> > + int lops = DIV_ROUND_UP(DIV_ROUND_UP(length, PAGE_SIZE) + > 1, >> > + PAGE_SIZE / sizeof(u32)); >> > + u32 *lop; >> > + struct sg_table *sg; >> > + struct sg_page_iter sg_iter; >> > + >> > + if (lops <= 0 || lops > CIO2_MAX_LOPS) { >> > + dev_err(dev, "%s: bad buffer size (%i)\n", __func__, > length); >> > + return -ENOSPC; /* Should never happen */ >> > + } >> > + >> > + /* Allocate LOP table */ >> > + b->lop = lop = dma_alloc_noncoherent(dev, lops * PAGE_SIZE, >> > + &b->lop_bus_addr, > GFP_KERNEL); ^^ Here is the offending allocation. >> >> > + u32 fbpt_rp = >> > + (readl(cio2->base + > CIO2_REG_CDMARI(CIO2_DMA_CHAN)) >> > + >> CIO2_CDMARI_FBPT_RP_SHIFT) >> > + & CIO2_CDMARI_FBPT_RP_MASK; >> > + >> > + /* >> > + * fbpt_rp is the fbpt entry that the dma is > currently working >> > + * on, but since it could jump to next entry at any > time, >> > + * assume that we might already be there. >> > + */ >> > + fbpt_rp = (fbpt_rp + 1) % CIO2_MAX_BUFFERS; >> >> Hmm, this is really racy. This code can be pre-empted and not execute >> for quite long time, depending on system load, resuming after the >> hardware goes even further. Technically you could prevent this using >> *_irq_save()/_irq_restore(), but I'd try to find a way that doesn't >> rely on the timing, if possible. > > That is true, if the driver doesn't get executed in more than one frame > time. I don't think that's very common, but should be handled. > > Hmm. Actually the buffer has VALID bit which is set by driver to indicate > that the HW can fill the buffer and cleared by HW to indicate that the > buffer is filled. Probably the HW can not actually jump to the next > buffer as suggested by the comment, because I think the VALID bit > would be clear in that case. That should be checked. I think the problem here is that we keep all the entries valid and only point to dummy buffers if there are no buffers queued by userspace. > >> >> > + >> > + if (bufs_queued <= 1) >> > + next = fbpt_rp + 1; /* Buffers were > drained */ >> > + else if (fbpt_rp == next) >> > + next++; >> > + next %= CIO2_MAX_BUFFERS; >> > + } >> > + >> > + while (q->bufs[next]) { >> > + /* If the entry is used, get the next one, >> > + * We can not break here if all are filled, >> > + * Will wait for one free, otherwise it will crash >> > + */ > > That comment should be fixed. "otherwise it will crash" doesn't > tell much useful. Why would it crash? > >> > + dev_dbg(&cio2->pci_dev->dev, >> > + "entry %i was already full!\n", next); >> > + next = (next + 1) % CIO2_MAX_BUFFERS; >> >> A busy waiting, possibly infinite, loop. Hmm. > > It's not really busy waiting. We have allocated CIO2_MAX_BUFFERS > buffers (or actually just buffer entries in HW table) circularly for the > hardware, and then the user has requested N buffer queue. The driver > ensures N <= CIO2_MAX_BUFFERS and this guarantees that whenever user > queues a buffer, there necessarily is a free buffer in the hardware > circular buffer list. The loop above finds the first free buffer from the > circular list, which necessarily exists. In practice it should be always > the very first since that is the oldest one given to hardware. > >> >> I think we could do something smarter here, such as sleeping on a >> wait_queue, which is woken up from the interrupt handler. > > I think that's a bit complicated for situation which should be never > possible to happen. > >> Also, why do you think it will crash? I think you can just do return >> the buffer to vb2 with _ERROR status and bail out, if you can't queue >> due to some failure. > > Agree. Given your explanation, wouldn't it make sense to actually make the loop finite, limited by the number of buffers and if (due to some unforeseen condition, like a driver bug) there is no buffer available until then, error out? Best regards, Tomasz