Dear Tomasz, Thank you for your comments. On Wed, 2019-05-22 at 19:25 +0900, Tomasz Figa wrote: > Hi Frederic, > > On Wed, May 22, 2019 at 03:14:15AM +0800, Frederic Chen wrote: > > Dear Tomasz, > > > > I appreciate your comment. It is very helpful for us. > > > > You're welcome. Thanks for replying to all the comments. I'll skip those > resolved in my reply to keep the message shorter. > > > > > On Thu, 2019-05-09 at 18:48 +0900, Tomasz Figa wrote: > > > Hi Frederic, > > > > > > On Wed, Apr 17, 2019 at 7:45 PM Frederic Chen <frederic.chen@xxxxxxxxxxxx> wrote: > [snip] > > > > + > > > > + timestamp = ktime_get_ns(); > > > > > > This timestamp is unused. > > > > This line will be removed in the next patch. > > > > > > > > > + > > > > + for (i = 0; i < MTK_DIP_VIDEO_NODE_ID_TOTAL_NUM; i++) { > > > > + struct mtk_dip_dev_buffer *dev_buf = job_info->buf_map[i]; > > > > + > > > > + if (!dev_buf) { > > > > + continue; > > > > + } else { > > > > > > No need to put this code under else. > > > > I got it. > > > > > > > > > + dev_buf->vbb.vb2_buf.timestamp = ktime_get_ns(); > > > > > > Did you mean to use the timestamp variable here? > > > > I would like to remove timestamp and still use > > dev_buf->vbb.vb2_buf.timestamp directly. > > > > That would result in buffers from different queues having different > timestamps. > > However, this is a mem2mem device, so we shouldn't assign timestamps > here. We should copy the timestamp from the matching input buffer. I > missed that before, sorry. I will copy the raw input buffer's timestamps to the mdp 0 and mdp 1 buffer's ones. > > [snip] > > > > > > > > + dev_buf->buffer_usage = node->dev_q.buffer_usage; > > > > > > There should be no buffer_usage, but rather separate queue for each usage. > > > > I will remove mtk_dip_dev_buffer.buffer_usage and > > mtk_dip_dev_queue.buffer_usage. > > > > May I add the dma port information (DMA port selection hint) in each > > video node's setting (mtk_dip_video_device_desc) so that we can retrieve > > the hint when we need to pass the buffer to co-processor as following: > > > > > > static int fill_ipi_img_param(struct mtk_dip_pipe *dip_pipe, > > struct img_image_buffer *img, > > struct mtk_dip_dev_buffer *dev_buf, > > char *buf_name) > > { > > struct mtk_dip_video_device *vdo_dev; > > // ... > > vdo_dev = mtk_dip_vbq_to_node(dev_buf->vbb.vb2_buf.vb2_queue); > > /* image parameter will be passed to firmware*/ > > img->usage = vdo_dev->desc->dma_hint; > > // ... > > } > > > > Yes, that sounds sensible, but I'm not sure "hint" is a good name here, > because the purpose is precisely defined, not just a hint. Perhaps > better to call it something like "dma_port"? > Yes, I would like to use "dma_port" since it is better. > > > > > > > > > + dev_buf->rotation = node->dev_q.rotation; > > > > + > > > > + if (desc->smem_alloc) { > > > > + dev_buf->scp_daddr = > > > > + mtk_dip_smem_iova_to_phys > > > > + (dip_pipe->smem_alloc_dev, > > > > + dev_buf->isp_daddr); > > > > > > Isn't this overly wrapped? The first argument should fit in the second line. > > > > Do you mean the modification like this: > > > > dev_buf->scp_daddr = mtk_dip_smem_iova_to_phys > > (dip_pipe->smem_alloc_dev, > > dev_buf->isp_daddr); > > Almost. The wrapped lines should be aligned to the right as much as > possible, so it would be more like: > > dev_buf->scp_daddr = mtk_dip_smem_iova_to_phys( > dip_pipe->smem_alloc_dev, > dev_buf->isp_daddr); > I got it. > [snip] > > > > + > > > > + if (fjob->sequence == -1) { > > > > + pr_err("%s: Invalid cmdq_cb_data(%p)\n", > > > > + __func__, data); > > > > + ipi_fparam = NULL; > > > > > > As far as I can see, we always get the data we explicitly gave to the cmdq > > > driver together with the callback function, so it sounds like we shouldn't > > > be able to get invalid data here. > > > > > > > I'm not sure if we can always trust CMDQ callback since we had > > suffered from an CMDQ bug that return an invalid address before. > > > > Hmm, but these values don't really go to the CMDQ hardware or firmware, > right? It's just the kernel CMDQ driver that looks the data pointer in > its internal state. That would be a bug similar to the workqueue > framework giving a bad pointer to the work function. > > Also, checking for -1 alone doesn't really help us too much, because we > could for example get a pointer to some memory filled with random > values, in case of such bug. > > Perhaps we should inspect the CMDQ driver design instead to make sure > it's impossible to receive a bad pointer from it. > I agree with you. I will remove the fjob->sequence check codes. > [snip] > > > > > + > > > > + if (WARN_ONCE(!mdpcb_work, "%s: frame_no(%d) is lost", > > > > + __func__, frameparam->frame_no)) > > > > + return; > > > > + > > > > + INIT_WORK(&mdpcb_work->frame_work, mdp_cb_worker); > > > > + mdpcb_work->frameparams = frameparam; > > > > + if (data.sta != CMDQ_CB_NORMAL) > > > > + mdpcb_work->frameparams->state = FRAME_STATE_HW_TIMEOUT; > > > > + > > > > + queue_work(dip_hw->mdpcb_workqueue, &mdpcb_work->frame_work); > > > > > > Looking at mdp_cb_worker() and mtk_dip_notify() called by it, after fixing > > > the locking to use spinlocks rather than mutexes and remove some dynamic > > > allocations pointed out in other comments, we could just run them in atomic > > > context, without the need for this workqueue. > > > > I will re-design this part to support the function to be run in > > atomic context. > > > > I would like to discuss the error handling in this function > > for the next patch. When we get FRAME_STATE_HW_TIMEOUT, is it > > suitable to trigger a work to notify SCP with scp_ipi_send() to > > dump debug messages in further? (Can this kind of debugging > > function be upstream?) > > > > Yes, we should be able to have something like that. > > > > > > > > +} > > > > + > > > > +static void dip_vpu_handler(void *data, unsigned int len, void *priv) > > > > +{ > > > > + struct img_frameparam *framejob; > > > > + struct img_ipi_frameparam *frameparam; > > > > + struct mtk_dip_hw *dip_hw; > > > > + struct mtk_dip_dev *dip_dev; > > > > + unsigned long flags; > > > > + u32 num; > > > > + > > > > + if (WARN_ONCE(!data, "%s: failed due to NULL data\n", __func__)) > > > > + return; > > > > + > > > > + frameparam = (struct img_ipi_frameparam *)data; > > > > + framejob = dip_create_framejob(frameparam->index); > > > > > > Rather than creating this dynamically, it should be already created. The > > > reason you have the priv argument to this function is that you can have the > > > SCP driver give to you a pointer to some struct that you gave to it when > > > calling scp_ipi_register(). That struct should have a list of valid VPU jobs > > > and here you should iterate through that list and find a matching job or > > > fail if there is no matching one. (We need to validate here, because we > > > can't trust the values coming from the firmware.) > > > > May I embedded img_frameparam into the extended media_request object > > so that it can be managed with media_request objects? > > > > struct mtk_dip_request { > > struct media_request request; > > struct mtk_dip_pipe_job_info job_info; > > struct mtk_dip_hw_work dip_work; > > struct img_frameparam job; > > struct mtk_dip_hw_mdpcb_timeout_work timeout_work; > > }; > > > > > > Yes, I think it would make sense indeed. > > [snip] > > > > > +static int dip_runner_func(void *data) > > > > +{ > > > > + struct img_frameparam *framejob; > > > > + struct mtk_dip_hw *dip_hw; > > > > + struct mtk_dip_dev *dip_dev; > > > > + struct mtk_dip_hw_user_id *user_id; > > > > + unsigned long flags; > > > > + bool found; > > > > + u32 queuecnt, num; > > > > + int ret; > > > > + > > > > + dip_hw = (struct mtk_dip_hw *)data; > > > > + dip_dev = mtk_dip_hw_to_dev(dip_hw); > > > > + > > > > + while (1) { > > > > + spin_lock_irqsave(&dip_hw->dip_gcejoblist.queuelock, flags); > > > > + queuecnt = dip_hw->dip_gcejoblist.queue_cnt; > > > > + spin_unlock_irqrestore(&dip_hw->dip_gcejoblist.queuelock, > > > > + flags); > > > > + > > > > + ret = wait_event_interruptible_timeout > > > > + (dip_hw->dip_runner_thread.wq, > > > > + queuecnt || kthread_should_stop(), > > > > + msecs_to_jiffies(DIP_COMPOSER_THREAD_TIMEOUT)); > > > > > > Checking queuecnt here is also wrong. Please check my comment to another > > > wait_event_* in this patch. > > > > > > Also the interruptibility and timeout don't make sense here, because we just > > > want to keep waiting unless something shows up or the kthread is forcefully > > > woken up for stopping. > > > > > > However, we probably want it to be freezable(), so that we don't have to > > > bring the thread down and up on suspend/resume. > > > > > > I would like to fix it as following: > > > > > > static bool mtk_dip_hw_qce_queuecnt(struct mtk_dip_hw *dip_hw) > > { > > int queuecnt; > > > > spin_lock_irqsave(&dip_hw->dip_gcejoblist.queuelock, flags); > > queuecnt = dip_hw->dip_gcejoblist.queue_cnt; > > spin_unlock_irqrestore(&dip_hw->dip_gcejoblist.queuelock, > > > > return queuecnt; > > } > > > > static int dip_runner_func(void *data) > > { > > > > //... > > set_freezable(); > > while(1) { > > wait_event_freezable((dip_hw->dip_runner_thread.wq, > > mtk_dip_hw_qce_queuecnt() || > > kthread_should_stop()); > > > > if (kthread_should_stop()) > > break; > > > > // Call MDP/GCE API to do HW excecution > > // ... > > > > } > > } > > > > Sounds good to me. > > [snip] > > > > > + > > > > +free_work_list: > > > > + > > > > + dev_dbg(&dip_dev->pdev->dev, > > > > + "%s, free: frame_no(%d),idx(0x%x),worklist cnt(%d),composing num(%d)\n", > > > > + __func__, dip_work->frameparams.frame_no, > > > > + dip_work->frameparams.index, len, num); > > > > + > > > > + kfree(dip_work); > > > > > > We haven't allocated this object and so we shouldn't free it here. The layer > > > that allocated it should receive it back and free. (But we probably don't > > > need to allocate it dynamically as per my other comments.) > > > > > > > May I also merge dip_work into mtk_dip_request? > > > > struct mtk_dip_request { > > struct media_request request; > > struct mtk_dip_pipe_job_info job_info; > > struct mtk_dip_hw_work dip_work; > > struct img_frameparam job; > > struct mtk_dip_hw_mdpcb_timeout_work timeout_work; > > }; > > > > Yes, it should indeed make sense. > > > > > > Also a general note - a work can be queued only once. This means that > > > current code races when two dip_works are attempted to be queued very > > > quickly one after another (or even at the same time from different threads). > > > > > > I can think of two potential options for fixing this: > > > > > > 1) Loop in the work function until there is nothing to queue to the hardware > > > anymore - but this needs tricky synchronization, because there is still > > > short time at the end of the work function when a new dip_work could be > > > added. > > > > > > 2) Change this to a kthread that just keeps running in a loop waiting for > > > some available dip_work to show up and then sending it to the firmware. > > > This should be simpler, as the kthread shouldn't have a chance to miss > > > any dip_work queued. > > > > > > I'm personally in favor of option 2, as it should simplify the > > > synchronization. > > > > > > > I would like to re-design this part with a kthread in the next patch. > > Actually I missed another option. We could have 1 work_struct for 1 > request and then we could keep using a workqueue. Perhaps that could be > simpler than a kthread. > > Actually, similar approach could be used for the dip_runner_func. > Instead of having a kthread looping, we could just have another > workqueue and 1 dip_runner_work per 1 request. Then we wouldn't need to > do the waiting loop ourselves anymore. > > Does it make sense? Yes, it make sense. Let me summarize the modification about the flow. First, we will have two work_struct in mtk_dip_request. struct mtk_dip_request { struct media_request request; //... /* Prepare DIP part hardware configurtion */ struct mtk_dip_hw_submit_work submit_work; /* Replace dip_running thread jobs*/ struct mtk_dip_hw_composing_work composing_work; /* Only for composing error handling */ struct mtk_dip_hw_mdpcb_timeout_work timeout_work; }; Second, the overall flow of handling each request is : 1. mtk_dip_hw_enqueue calls queue_work() to put submit_work into its workqueue 2. submit_work sends IMG_IPI_FRAME command to SCP to prepare DIP hardware configuration 3. dip_scp_handler receives the IMG_IPI_FRAME result from SCP 4. dip_scp_handler calls queue_work() to put composing_work (instead of original dip_running thread jobs) into its workqueue 5. composing_work calls dip_mdp_cmdq_send() to finish the mdp part tasks 6. dip_mdp_cb_func() trigged by MDP driver calls vb2_buffer_done to return the buffer (no workqueue required here) > > [snip] > > > > > + buf->buffer.iova = sg_dma_address(buf->table.sgl); > > > > + buf->tuning_buf.iova = buf->buffer.iova + > > > > + DIP_TUNING_OFFSET; > > > > + > > > > + dev_dbg(&dip_dev->pdev->dev, > > > > + "%s: buf(%d), pa(%p), iova(%p)\n", > > > > + __func__, i, buf->buffer.pa, buf->buffer.iova); > > > > + > > > > + dev_dbg(&dip_dev->pdev->dev, > > > > + "%s: config_data(%d), pa(%p), iova(%p)\n", > > > > + __func__, i, buf->config_data.pa, buf->config_data.va); > > > > + > > > > + dev_dbg(&dip_dev->pdev->dev, > > > > + "%s: tuning_buf(%d), pa(%p), iova(%p)\n", > > > > + __func__, i, buf->tuning_buf.pa, buf->tuning_buf.iova); > > > > + > > > > + dev_dbg(&dip_dev->pdev->dev, > > > > + "%s: frameparam(%d), pa(%p), iova(%p)\n", > > > > + __func__, i, buf->frameparam.pa, buf->frameparam.va); > > > > + > > > > + list_add_tail(&buf->list_entry, > > > > + &dip_hw->dip_freebufferlist.queue); > > > > + dip_hw->dip_freebufferlist.queue_cnt++; > > > > + kfree(pages); > > > > + } > > > > > > But actually, why aren't these buffers managed by VB2? > > > > > > > I would like to manage the buffer with VB2, but I'm not sure what is > > the right way to do that. > > > > These are DIP internal working buffers. The buffers are not read or > > written by user application. In this case, could I add a meta capture > > video device to manage them or integrate the buffers with tuning meta > > output buffer directly? > > > > Ah, if these are internal buffers, then I guess we indeed need to > allocate them ourselves. I got it. > > [snip] > > > > > + frameparam.state = FRAME_STATE_INIT; > > > > + dip_send(dip_hw->vpu_pdev, SCP_IPI_DIP_INIT, > > > > + (void *)&frameparam, sizeof(frameparam), 0); > > > > > > Is the call above synchronous? If not, don't we need to wait here for SCP to > > > initialize? > > > > > > > No, it is not synchronous when passing 0 in the latest parameter. We > > would like to improve it by using scp_ipi_send() directly instead of > > dip_send(), and wait for the ack from SCP with 2ms timeout setting. > > > > ret = scp_ipi_send(dip_hw->scp_pdev SCP_IPI_DIP_INIT, > > &frameparam, sizeof(frameparam), 2); > > > > 2ms could likely fail here quite often, because even kernel code can be > preempted. We should have something like 200ms here, just to avoid > blocking indefinitely, but long enough not to happen even during high > system load. > I got it. > [snip] > > > > > > > + framework->frameparams.state = FRAME_STATE_INIT; > > > > + framework->frameparams.frame_no = > > > > + atomic_inc_return(&dip_hw->dip_enque_cnt); > > > > > > It sounds like the struct passed as the frameparams argument to this > > > function and framework->frameparams should be two different structs. > > > > > > > I'm not sure if I understand it. Would you like to elaborate on this > > point? > > > > I would like to merge mtk_dip_hw_enqueue and mtk_dip_pipe_job_start so > > that we can configure the framework->frameparams in a single function. > > > > Ah, that comment might have been added before I noticed that the > functions can be merged. Forgot to remove, sorry. Please ignore. > > [snip] > > > > > +static int __maybe_unused mtk_dip_pm_suspend(struct device *dev) > > > > +{ > > > > + struct mtk_dip_dev *dip_dev = dev_get_drvdata(dev); > > > > + > > > > + if (atomic_read(&dip_dev->dip_hw.dip_user_cnt) > 0) { > > > > + mtk_dip_hw_set_clk(&dip_dev->dip_hw, false); > > > > + dev_dbg(&dip_dev->pdev->dev, "%s: disable clock\n", > > > > + __func__); > > > > + } > > > > > > Don't we need to do something to prevent the driver from scheduling next > > > frames here and wait for the hardware to finish processing current frame > > > here? > > > > > > > I will add some design to handle this case. Could we add a > > suspend variable in mtk_dip_hw and check it in submit work kthread? > > > > Actually, with the other changes, the kthread would have been frozen for > us at this point, so maybe there isn't anything to do here! :) > > OR, if we change the kthread to a workqueue (as per my other comment > above), we could make the workqueue freezable (by adding WQ_FREEZABLE to > the flags argument at creation time) and it would be frozen for us after > finishing current work struct. > Thank you for the advice. I would like to use workqueue with WQ_FREEZABLE. > > > > + > > > > + return 0; > > > > +} > > > > + > > > > +static int __maybe_unused mtk_dip_pm_resume(struct device *dev) > > > > +{ > > > > + struct mtk_dip_dev *dip_dev = dev_get_drvdata(dev); > > > > + > > > > + if (atomic_read(&dip_dev->dip_hw.dip_user_cnt) > 0) { > > > > + mtk_dip_hw_set_clk(&dip_dev->dip_hw, true); > > > > + dev_dbg(&dip_dev->pdev->dev, "%s: enable clock\n", > > > > + __func__); > > > > + } > > > > > > Don't we need to kick off the hardware here to continue processing from > > > the next frame? > > > > > > > I will add some design to handle this case. > > > > As per above, making the kthreads/workqueues freezable actually should > solve the problem. > > [snip] > > > > > +static int mtk_dip_subdev_close(struct v4l2_subdev *sd, > > > > + struct v4l2_subdev_fh *fh) > > > > +{ > > > > + struct mtk_dip_pipe *dip_pipe = mtk_dip_subdev_to_pipe(sd); > > > > + struct mtk_dip_dev *dip_dev = > > > > + dev_get_drvdata(&dip_pipe->dip_dev->pdev->dev); > > > > + > > > > + dev_dbg(&dip_pipe->dip_dev->pdev->dev, > > > > + "%s:%s: pipe(%d) disconnect to dip_hw\n", > > > > + __func__, dip_pipe->desc->name, > > > > + dip_pipe->desc->id); > > > > + > > > > + return mtk_dip_hw_disconnect(&dip_dev->dip_hw); > > > > +} > > > > > > We shouldn't do any hardware operations just at subdev open or close. Those > > > should happen once the streaming actually starts. > > > > > > Userspace is expected to open device nodes just to query them and that > > > shouldn't have any side effects. > > > > > > In general I don't see why we actually should even do anything in subdev ops > > > in this driver. The subdev is only exposed to bridge all the video nodes > > > together and any control is done via either video ioctls or metadata > > > buffers. > > > > > > > I will move the hardware operation to streamon. We call rproc_boot() to > > load frimware in mtk_dip_hw_connect(), should it be moved to streamon, > > too? > > > > I think that would normally be done in runtime PM callbacks, with > autosuspend delay enabled to avoid continuous power-off and on > repeateadly. > > Then, when there is a request to queue, the driver would call > pm_runtime_get_sync() and when a request completes, > pm_runtime_put_autosuspend(). > I got it. I will put it in PM callbacks and use pm_runtime_get_sync() and pm_runtime_put_autosuspend() in request handling flow. > [snip] > > > > > +static void mtk_dip_node_to_v4l2(struct mtk_dip_pipe *dip_pipe, > > > > + u32 idx, > > > > + struct video_device *vdev, > > > > + struct v4l2_format *f) > > > > +{ > > > > + u32 cap; > > > > + struct mtk_dip_video_device *node = &dip_pipe->nodes[idx]; > > > > + > > > > + cap = mtk_dip_node_get_v4l2_cap(dip_pipe, node); > > > > + vdev->ioctl_ops = node->desc->ops; > > > > + vdev->device_caps = V4L2_CAP_STREAMING | cap; > > > > > > Why not just have mtk_dip_node_get_v4l2_cap() include V4L2_CAP_STREAMING in > > > the return value? > > > > > > > I would like to set the V4L2_CAP_STREAMING in node->desc->cap directly. > > > > vdev->device_caps = node->desc->caps; > > > > Sure, that's also good. > > Best regards, > Tomasz Sincerely, Frederic Chen