Hi Nicolas > -----Original Message----- > From: Nicolas Dufresne <nicolas.dufresne@xxxxxxxxxxxxx> > Sent: Thursday, December 19, 2024 1:42 AM > To: jackson.lee <jackson.lee@xxxxxxxxxxxxxxx>; mchehab@xxxxxxxxxx; > hverkuil-cisco@xxxxxxxxx; sebastian.fricke@xxxxxxxxxxxxx; > bob.beckett@xxxxxxxxxxxxx; dafna.hirschfeld@xxxxxxxxxxxxx > Cc: linux-media@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; lafley.kim > <lafley.kim@xxxxxxxxxxxxxxx>; b-brnich@xxxxxx; hverkuil@xxxxxxxxx; Nas > Chung <nas.chung@xxxxxxxxxxxxxxx> > Subject: Re: [PATCH v1 3/5] media: chips-media: wave5: Improve performance > of decoder > > Hi Jackson, > > Le mercredi 18 décembre 2024 à 05:05 +0000, jackson.lee a écrit : > > Hi Nicolas > > > > Yes, this patch is very complex in order to review. > > So I will split this patch to three patch sets. > > > > 1. Main patch to improve performance of decoder. > > As you know, the previous code's flow was driver has to wait until > getting a decoded frame after feeding a bitstream, then next bitstream was > fed into a firmware. > > To archive goal, we needed to work asynchronously between output and > capture queue. > > So we had to remove the v4l2_m2m_job_finish from > wave5_vpu_dec_finish_decode and had to put it into the device_run to > handle decoding. > > > > 2. Since applying changes for performance improvement of decoder, we had > a problem related to high CPU load, > > CPU load was more than 4 times when comparing CPU load between the > previous and current code. > > The root cause was the device_run was called many times even if there > was no bitstream which should be queued. > > This needs changes in job_ready() ops, which is what decide if the device > must run or not. It seems difficult to do 2. separately from 1. though, > but perhaps you can do that before 1. > To reduce CPU Load, the job_ready() was modified. Thanks > > > > 3. Fix Null reference while testing fluster with more than 2 decoders. > > When multi instances are created/destroyed, many interrupts happens > or structures for decoder are removed. > > "struct vpu_instance" this structure is shared for all flow in > decoder, so if the structure is not protected by lock, Null reference > exception could happens sometimes. > > IRQ Handler was spilt to two phases and Lock was added as well. > > This #3 is what introduced most of the refactoring, perhaps you want this > to come first. In short, if you move code around, do that separately, then > add the locking in preparation for the newly introduce concurrency from 1. > > Thanks for this information, it will certainly help in the next review. > > regards, > Nicolas > > > > > > > > -----Original Message----- > > > From: Nicolas Dufresne <nicolas.dufresne@xxxxxxxxxxxxx> > > > Sent: Saturday, December 14, 2024 4:48 AM > > > To: jackson.lee <jackson.lee@xxxxxxxxxxxxxxx>; mchehab@xxxxxxxxxx; > > > hverkuil-cisco@xxxxxxxxx; sebastian.fricke@xxxxxxxxxxxxx; > > > bob.beckett@xxxxxxxxxxxxx; dafna.hirschfeld@xxxxxxxxxxxxx > > > Cc: linux-media@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > > > lafley.kim <lafley.kim@xxxxxxxxxxxxxxx>; b-brnich@xxxxxx; > > > hverkuil@xxxxxxxxx; Nas Chung <nas.chung@xxxxxxxxxxxxxxx> > > > Subject: Re: [PATCH v1 3/5] media: chips-media: wave5: Improve > > > performance of decoder > > > > > > Le lundi 09 décembre 2024 à 14:36 +0900, Jackson.lee a écrit : > > > > The existing way for decoding frames was to wait until each frame > > > > was decoded after feeding a bitstream. As a result, performance > > > > was low and Wave5 could not achieve max pixel processing rate. > > > > > > > > Update driver to use an asynchronous approach for decoding and > > > > feeding a bitstream in order to achieve full capabilities of the > device. > > > > > > That patch is just massive and difficult to review. It does > > > important change to the driver design, which require important > > > threading review. I'm not sure I can provide a good review without > > > this change being better introduced and broken down. Few things: > > > > > > - Try and explain the threading changes in the commits, and perhaps > > > introduce the change in the cover letter > > > > > > > - Split the refactoring of the ring buffer handling in separate non- > > > functional change patches > > > > I didn't modify code to handle the ring buffer, its line was just moved > to prevent compiling error. > > > > > > > - Change the job_ready() logic and use the m2m thread instead of > creating > > > another one and wasting context switches > > > > > > I will do that. > > > > > > Thanks > > Jackson > > > > > > > > > > What needs a good theoretical review, is the handling of "abort()" > flow. > > > We had a big struggle with that while helping with the up-streaming. > > > That is more or less why for time constraints reason we went for > > > this sub- optimal method of idling the HW between jobs. Don't read > > > me wrong, I'm very happy so see work being done on that. > > > > > > In parallel, do you agree that 1/5 and 2/5 could be applied already > > > ? If so, can you send these two in their own serie, and please, add > > > the missing > > > Fixes: tag, which are needed for backport purpose. > > > > > > regards, > > > Nicolas > > > > > > > > > > > Signed-off-by: Jackson.lee <jackson.lee@xxxxxxxxxxxxxxx> > > > > Signed-off-by: Nas Chung <nas.chung@xxxxxxxxxxxxxxx> > > > > --- > > > > .../platform/chips-media/wave5/wave5-helper.c | 20 +- > > > > .../chips-media/wave5/wave5-vpu-dec.c | 400 +++++++++++------ > - > > > > .../chips-media/wave5/wave5-vpu-enc.c | 8 +- > > > > .../platform/chips-media/wave5/wave5-vpu.c | 73 +++- > > > > .../platform/chips-media/wave5/wave5-vpuapi.c | 24 +- > > > > .../platform/chips-media/wave5/wave5-vpuapi.h | 15 +- > > > > 6 files changed, 355 insertions(+), 185 deletions(-) > > > > > > > > diff --git > > > > a/drivers/media/platform/chips-media/wave5/wave5-helper.c > > > > b/drivers/media/platform/chips-media/wave5/wave5-helper.c > > > > index 2c9d8cbca6e4..83f30ad36e4f 100644 > > > > --- a/drivers/media/platform/chips-media/wave5/wave5-helper.c > > > > +++ b/drivers/media/platform/chips-media/wave5/wave5-helper.c > > > > @@ -30,7 +30,6 @@ const char *state_to_str(enum vpu_instance_state > > > > state) void wave5_cleanup_instance(struct vpu_instance *inst) { > > > > int i; > > > > - > > > > > > nit: I kind of liked that blank line, but its yours to decide what > > > you like the best. > > > > > > > /* > > > > * For Wave515 SRAM memory is allocated at > > > > * wave5_vpu_dec_register_device() and freed at @@ -49,7 > +48,9 > > > > @@ void wave5_cleanup_instance(struct vpu_instance *inst) > > > > v4l2_fh_del(&inst->v4l2_fh); > > > > v4l2_fh_exit(&inst->v4l2_fh); > > > > } > > > > - list_del_init(&inst->list); > > > > + > > > > + kfifo_free(&inst->irq_status); > > > > + mutex_destroy(&inst->feed_lock); > > > > ida_free(&inst->dev->inst_ida, inst->id); > > > > kfree(inst->codec_info); > > > > kfree(inst); > > > > @@ -61,8 +62,22 @@ int wave5_vpu_release_device(struct file *filp, > { > > > > struct vpu_instance *inst = wave5_to_vpu_inst(filp- > >private_data); > > > > int ret = 0; > > > > + unsigned long flags; > > > > + > > > > + if (inst->run_thread) { > > > > + kthread_stop(inst->run_thread); > > > > + up(&inst->run_sem); > > > > + inst->run_thread = NULL; > > > > + } > > > > > > > > v4l2_m2m_ctx_release(inst->v4l2_fh.m2m_ctx); > > > > + ret = mutex_lock_interruptible(&inst->dev->irq_lock); > > > > + if (ret) > > > > + return ret; > > > > + spin_lock_irqsave(&inst->dev->irq_spinlock, flags); > > > > + list_del_init(&inst->list); > > > > + spin_unlock_irqrestore(&inst->dev->irq_spinlock, flags); > > > > + mutex_unlock(&inst->dev->irq_lock); > > > > if (inst->state != VPU_INST_STATE_NONE) { > > > > u32 fail_res; > > > > > > > > @@ -79,7 +94,6 @@ int wave5_vpu_release_device(struct file *filp, > > > > } > > > > > > > > wave5_cleanup_instance(inst); > > > > - > > > > return ret; > > > > } > > > > > > > > diff --git > > > > a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c > > > > b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c > > > > index d3ff420c52ce..b0e1f94bdb30 100644 > > > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c > > > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c > > > > @@ -6,6 +6,9 @@ > > > > */ > > > > > > > > #include <linux/pm_runtime.h> > > > > +#include <linux/delay.h> > > > > +#include <linux/timer.h> > > > > +#include <linux/atomic.h> > > > > #include "wave5-helper.h" > > > > > > > > #define VPU_DEC_DEV_NAME "C&M Wave5 VPU decoder" > > > > @@ -101,6 +104,24 @@ static const struct vpu_format > > > dec_fmt_list[FMT_TYPES][MAX_FMTS] = { > > > > } > > > > }; > > > > > > > > +static int run_thread(void *data) { > > > > + struct vpu_instance *inst = (struct vpu_instance *)data; > > > > + struct v4l2_m2m_ctx *m2m_ctx = inst->v4l2_fh.m2m_ctx; > > > > + > > > > + while (!kthread_should_stop()) { > > > > + if (down_interruptible(&inst->run_sem)) > > > > + continue; > > > > + > > > > + if (kthread_should_stop()) > > > > + break; > > > > + > > > > + v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx); > > > > + } > > > > + > > > > + return 0; > > > > +} > > > > + > > > > /* > > > > * Make sure that the state switch is allowed and add logging for > > > debugging > > > > * purposes > > > > @@ -230,7 +251,6 @@ static int start_decode(struct vpu_instance > > > > *inst, > > > u32 *fail_res) > > > > switch_state(inst, VPU_INST_STATE_STOP); > > > > > > > > dev_dbg(inst->dev->dev, "%s: pic run failed / finish > job", > > > __func__); > > > > - v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx); > > > > } > > > > > > > > return ret; > > > > @@ -347,7 +367,6 @@ static void wave5_vpu_dec_finish_decode(struct > > > vpu_instance *inst) > > > > struct vb2_v4l2_buffer *dec_buf = NULL; > > > > struct vb2_v4l2_buffer *disp_buf = NULL; > > > > struct vb2_queue *dst_vq = v4l2_m2m_get_dst_vq(m2m_ctx); > > > > - struct queue_status_info q_status; > > > > > > > > dev_dbg(inst->dev->dev, "%s: Fetch output info from > firmware.", > > > > __func__); > > > > > > > > @@ -360,11 +379,22 @@ static void > > > > wave5_vpu_dec_finish_decode(struct > > > > vpu_instance *inst) > > > > > > > > dev_dbg(inst->dev->dev, "%s: rd_ptr %pad wr_ptr %pad", > __func__, > > > &dec_info.rd_ptr, > > > > &dec_info.wr_ptr); > > > > - wave5_handle_src_buffer(inst, dec_info.rd_ptr); > > > > > > > > dev_dbg(inst->dev->dev, "%s: dec_info dec_idx %i > disp_idx %i", > > > __func__, > > > > dec_info.index_frame_decoded, > dec_info.index_frame_display); > > > > > > > > + if (inst->std == W_AVC_DEC && > > > > + dec_info.index_frame_decoded == DECODED_IDX_FLAG_SKIP && > > > > + dec_info.index_frame_display == DISPLAY_IDX_FLAG_NO_FB) { > > > > + struct vb2_v4l2_buffer *src_buf = > > > v4l2_m2m_src_buf_remove(m2m_ctx); > > > > + > > > > + if (src_buf) > > > > + v4l2_m2m_buf_done(src_buf, VB2_BUF_STATE_ERROR); > > > > + return; > > > > + } > > > > + > > > > + wave5_handle_src_buffer(inst, dec_info.rd_ptr); > > > > + > > > > if (!vb2_is_streaming(dst_vq)) { > > > > dev_dbg(inst->dev->dev, "%s: capture is not > streaming..", > > > __func__); > > > > v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx); @@ - > 441,20 > > > +471,6 > > > > @@ static void wave5_vpu_dec_finish_decode(struct vpu_instance *inst) > > > > } > > > > spin_unlock_irqrestore(&inst->state_spinlock, flags); > > > > } > > > > - > > > > - /* > > > > - * During a resolution change and while draining, the > firmware may > > > flush > > > > - * the reorder queue regardless of having a matching decoding > > > operation > > > > - * pending. Only terminate the job if there are no more IRQ > coming. > > > > - */ > > > > - wave5_vpu_dec_give_command(inst, DEC_GET_QUEUE_STATUS, > &q_status); > > > > - if (q_status.report_queue_count == 0 && > > > > - (q_status.instance_queue_count == 0 || > > > dec_info.sequence_changed)) { > > > > - dev_dbg(inst->dev->dev, "%s: finishing job.\n", > __func__); > > > > - pm_runtime_mark_last_busy(inst->dev->dev); > > > > - pm_runtime_put_autosuspend(inst->dev->dev); > > > > - v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx); > > > > - } > > > > } > > > > > > > > static int wave5_vpu_dec_querycap(struct file *file, void *fh, > > > > struct v4l2_capability *cap) @@ -465,6 +481,142 @@ static int > > > wave5_vpu_dec_querycap(struct file *file, void *fh, struct > > > v4l2_capab > > > > return 0; > > > > } > > > > > > > > +static int write_to_ringbuffer(struct vpu_instance *inst, void > > > > +*buffer, > > > size_t buffer_size, > > > > + struct vpu_buf *ring_buffer, dma_addr_t > wr_ptr) > > > { > > > > + size_t size; > > > > + size_t offset = wr_ptr - ring_buffer->daddr; > > > > + int ret; > > > > + > > > > + if (wr_ptr + buffer_size > ring_buffer->daddr + > > > > +ring_buffer->size) > > > { > > > > + size = ring_buffer->daddr + ring_buffer->size - wr_ptr; > > > > + ret = wave5_vdi_write_memory(inst->dev, ring_buffer, > offset, > > > (u8 *)buffer, size); > > > > + if (ret < 0) > > > > + return ret; > > > > + > > > > + ret = wave5_vdi_write_memory(inst->dev, ring_buffer, 0, > (u8 > > > *)buffer + size, > > > > + buffer_size - size); > > > > + if (ret < 0) > > > > + return ret; > > > > + } else { > > > > + ret = wave5_vdi_write_memory(inst->dev, ring_buffer, > offset, > > > (u8 *)buffer, > > > > + buffer_size); > > > > + if (ret < 0) > > > > + return ret; > > > > + } > > > > + > > > > + return 0; > > > > +} > > > > + > > > > +static int fill_ringbuffer(struct vpu_instance *inst) { > > > > + struct v4l2_m2m_ctx *m2m_ctx = inst->v4l2_fh.m2m_ctx; > > > > + struct vpu_src_buffer *vpu_buf; > > > > + int ret = 0; > > > > + > > > > + if (m2m_ctx->last_src_buf) { > > > > + struct vpu_src_buffer *vpu_buf = > > > > +wave5_to_vpu_src_buf(m2m_ctx->last_src_buf); > > > > + > > > > + if (vpu_buf->consumed) { > > > > + dev_dbg(inst->dev->dev, "last src buffer > already > > > written\n"); > > > > + return 0; > > > > + } > > > > + } > > > > + > > > > + list_for_each_entry(vpu_buf, &inst->avail_src_bufs, list) { > > > > + struct vb2_v4l2_buffer *vbuf = &vpu_buf- > >v4l2_m2m_buf.vb; > > > > + struct vpu_buf *ring_buffer = &inst->bitstream_vbuf; > > > > + size_t src_size = vb2_get_plane_payload(&vbuf->vb2_buf, > 0); > > > > + void *src_buf = vb2_plane_vaddr(&vbuf->vb2_buf, 0); > > > > + dma_addr_t rd_ptr = 0; > > > > + dma_addr_t wr_ptr = 0; > > > > + size_t remain_size = 0; > > > > + > > > > + if (vpu_buf->consumed) { > > > > + dev_dbg(inst->dev->dev, "already copied src buf > (%u) > > > to the ring buffer\n", > > > > + vbuf->vb2_buf.index); > > > > + continue; > > > > + } > > > > + > > > > + if (!src_buf) { > > > > + dev_dbg(inst->dev->dev, > > > > + "%s: Acquiring kernel pointer to src buf > (%u), > > > fail\n", > > > > + __func__, vbuf->vb2_buf.index); > > > > + break; > > > > + } > > > > + > > > > + ret = wave5_vpu_dec_get_bitstream_buffer(inst, &rd_ptr, > > > &wr_ptr, &remain_size); > > > > + if (ret) { > > > > + /* Unable to acquire the mutex */ > > > > + dev_err(inst->dev->dev, "Getting the bitstream > buffer, > > > fail: %d\n", > > > > + ret); > > > > + return ret; > > > > + } > > > > + > > > > + dev_dbg(inst->dev->dev, "%s: rd_ptr %pad wr_ptr %pad", > > > __func__, > > > > +&rd_ptr, &wr_ptr); > > > > + > > > > + if (remain_size < src_size) { > > > > + dev_dbg(inst->dev->dev, > > > > + "%s: remaining size: %zu < source > size: %zu for > > > src buf (%u)\n", > > > > + __func__, remain_size, src_size, vbuf- > > > > vb2_buf.index); > > > > + break; > > > > + } > > > > + > > > > + ret = write_to_ringbuffer(inst, src_buf, src_size, > > > ring_buffer, wr_ptr); > > > > + if (ret) { > > > > + dev_err(inst->dev->dev, "Write src buf (%u) to > ring > > > buffer, fail: %d\n", > > > > + vbuf->vb2_buf.index, ret); > > > > + return ret; > > > > + } > > > > + > > > > + ret = wave5_vpu_dec_update_bitstream_buffer(inst, > src_size); > > > > + if (ret) { > > > > + dev_dbg(inst->dev->dev, > > > > + "update_bitstream_buffer fail: %d for > src buf > > > (%u)\n", > > > > + ret, vbuf->vb2_buf.index); > > > > + break; > > > > + } > > > > + > > > > + vpu_buf->consumed = true; > > > > + > > > > + /* Don't write buffers passed the last one while > draining. > > > */ > > > > + if (v4l2_m2m_is_last_draining_src_buf(m2m_ctx, vbuf)) > { > > > > + dev_dbg(inst->dev->dev, "last src buffer > written to > > > the ring buffer\n"); > > > > + break; > > > > + } > > > > + > > > > + inst->queuing_num++; > > > > + list_del_init(&vpu_buf->list); > > > > + break; > > > > + } > > > > + > > > > + return ret; > > > > +} > > > > + > > > > +static void wave5_vpu_dec_feed_remaining(struct vpu_instance *inst) > { > > > > + int ret = 0; > > > > + struct v4l2_m2m_ctx *m2m_ctx = inst->v4l2_fh.m2m_ctx; > > > > + u32 fail_res = 0; > > > > + > > > > + mutex_lock(&inst->feed_lock); > > > > + ret = fill_ringbuffer(inst); > > > > + mutex_unlock(&inst->feed_lock); > > > > + if (ret) { > > > > + dev_warn(inst->dev->dev, "Filling ring buffer > failed\n"); > > > > + return; > > > > + } > > > > + > > > > + ret = start_decode(inst, &fail_res); > > > > + if (ret) { > > > > + dev_err(inst->dev->dev, > > > > + "Frame decoding on m2m context (%p), fail: %d > > > (result: %d)\n", > > > > + m2m_ctx, ret, fail_res); > > > > + } > > > > + > > > > + v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx); } > > > > + > > > > static int wave5_vpu_dec_enum_framesizes(struct file *f, void > > > > *fh, struct v4l2_frmsizeenum *fsize) { > > > > const struct vpu_format *vpu_fmt; @@ -794,11 +946,21 @@ > static > > > > int wave5_vpu_dec_stop(struct vpu_instance > > > *inst) > > > > } > > > > > > > > if (inst->state != VPU_INST_STATE_NONE) { > > > > + struct vb2_v4l2_buffer *vbuf; > > > > + struct vpu_src_buffer *vpu_buf; > > > > + > > > > /* > > > > * Temporarily release the state_spinlock so that > subsequent > > > > * calls do not block on a mutex while inside this > spinlock. > > > > */ > > > > spin_unlock_irqrestore(&inst->state_spinlock, flags); > > > > + vbuf = v4l2_m2m_last_src_buf(m2m_ctx); > > > > + if (vbuf) { > > > > + vpu_buf = wave5_to_vpu_src_buf(vbuf); > > > > + if (!vpu_buf->consumed) > > > > + wave5_vpu_dec_feed_remaining(inst); > > > > + } > > > > + > > > > ret = wave5_vpu_dec_set_eos_on_firmware(inst); > > > > if (ret) > > > > return ret; > > > > @@ -1116,115 +1278,6 @@ static int wave5_prepare_fb(struct > > > > vpu_instance > > > *inst) > > > > return 0; > > > > } > > > > > > > > -static int write_to_ringbuffer(struct vpu_instance *inst, void > > > > *buffer, > > > size_t buffer_size, > > > > - struct vpu_buf *ring_buffer, dma_addr_t > wr_ptr) > > > > -{ > > > > - size_t size; > > > > - size_t offset = wr_ptr - ring_buffer->daddr; > > > > - int ret; > > > > - > > > > - if (wr_ptr + buffer_size > ring_buffer->daddr + ring_buffer- > >size) > > > { > > > > - size = ring_buffer->daddr + ring_buffer->size - wr_ptr; > > > > - ret = wave5_vdi_write_memory(inst->dev, ring_buffer, > offset, > > > (u8 *)buffer, size); > > > > - if (ret < 0) > > > > - return ret; > > > > - > > > > - ret = wave5_vdi_write_memory(inst->dev, ring_buffer, 0, > (u8 > > > *)buffer + size, > > > > - buffer_size - size); > > > > - if (ret < 0) > > > > - return ret; > > > > - } else { > > > > - ret = wave5_vdi_write_memory(inst->dev, ring_buffer, > offset, > > > (u8 *)buffer, > > > > - buffer_size); > > > > - if (ret < 0) > > > > - return ret; > > > > - } > > > > - > > > > - return 0; > > > > -} > > > > - > > > > -static int fill_ringbuffer(struct vpu_instance *inst) -{ > > > > - struct v4l2_m2m_ctx *m2m_ctx = inst->v4l2_fh.m2m_ctx; > > > > - struct v4l2_m2m_buffer *buf, *n; > > > > - int ret; > > > > - > > > > - if (m2m_ctx->last_src_buf) { > > > > - struct vpu_src_buffer *vpu_buf = > > > wave5_to_vpu_src_buf(m2m_ctx->last_src_buf); > > > > - > > > > - if (vpu_buf->consumed) { > > > > - dev_dbg(inst->dev->dev, "last src buffer > already > > > written\n"); > > > > - return 0; > > > > - } > > > > - } > > > > - > > > > - v4l2_m2m_for_each_src_buf_safe(m2m_ctx, buf, n) { > > > > - struct vb2_v4l2_buffer *vbuf = &buf->vb; > > > > - struct vpu_src_buffer *vpu_buf = > wave5_to_vpu_src_buf(vbuf); > > > > - struct vpu_buf *ring_buffer = &inst->bitstream_vbuf; > > > > - size_t src_size = vb2_get_plane_payload(&vbuf->vb2_buf, > 0); > > > > - void *src_buf = vb2_plane_vaddr(&vbuf->vb2_buf, 0); > > > > - dma_addr_t rd_ptr = 0; > > > > - dma_addr_t wr_ptr = 0; > > > > - size_t remain_size = 0; > > > > - > > > > - if (vpu_buf->consumed) { > > > > - dev_dbg(inst->dev->dev, "already copied src buf > (%u) > > > to the ring buffer\n", > > > > - vbuf->vb2_buf.index); > > > > - continue; > > > > - } > > > > - > > > > - if (!src_buf) { > > > > - dev_dbg(inst->dev->dev, > > > > - "%s: Acquiring kernel pointer to src buf > (%u), > > > fail\n", > > > > - __func__, vbuf->vb2_buf.index); > > > > - break; > > > > - } > > > > - > > > > - ret = wave5_vpu_dec_get_bitstream_buffer(inst, &rd_ptr, > > > &wr_ptr, &remain_size); > > > > - if (ret) { > > > > - /* Unable to acquire the mutex */ > > > > - dev_err(inst->dev->dev, "Getting the bitstream > buffer, > > > fail: %d\n", > > > > - ret); > > > > - return ret; > > > > - } > > > > - > > > > - dev_dbg(inst->dev->dev, "%s: rd_ptr %pad wr_ptr %pad", > > > __func__, &rd_ptr, &wr_ptr); > > > > - > > > > - if (remain_size < src_size) { > > > > - dev_dbg(inst->dev->dev, > > > > - "%s: remaining size: %zu < source > size: %zu for > > > src buf (%u)\n", > > > > - __func__, remain_size, src_size, vbuf- > > > > vb2_buf.index); > > > > - break; > > > > - } > > > > - > > > > - ret = write_to_ringbuffer(inst, src_buf, src_size, > > > ring_buffer, wr_ptr); > > > > - if (ret) { > > > > - dev_err(inst->dev->dev, "Write src buf (%u) to > ring > > > buffer, fail: %d\n", > > > > - vbuf->vb2_buf.index, ret); > > > > - return ret; > > > > - } > > > > - > > > > - ret = wave5_vpu_dec_update_bitstream_buffer(inst, > src_size); > > > > - if (ret) { > > > > - dev_dbg(inst->dev->dev, > > > > - "update_bitstream_buffer fail: %d for > src buf > > > (%u)\n", > > > > - ret, vbuf->vb2_buf.index); > > > > - break; > > > > - } > > > > - > > > > - vpu_buf->consumed = true; > > > > - > > > > - /* Don't write buffers passed the last one while > draining. > > > */ > > > > - if (v4l2_m2m_is_last_draining_src_buf(m2m_ctx, vbuf)) > { > > > > - dev_dbg(inst->dev->dev, "last src buffer > written to > > > the ring buffer\n"); > > > > - break; > > > > - } > > > > - } > > > > - > > > > - return 0; > > > > -} > > > > - > > > > static void wave5_vpu_dec_buf_queue_src(struct vb2_buffer *vb) { > > > > struct vpu_instance *inst = vb2_get_drv_priv(vb->vb2_queue); > @@ > > > > -1236,6 +1289,11 @@ static void wave5_vpu_dec_buf_queue_src(struct > > > vb2_buffer *vb) > > > > vbuf->sequence = inst->queued_src_buf_num++; > > > > > > > > v4l2_m2m_buf_queue(m2m_ctx, vbuf); > > > > + > > > > + INIT_LIST_HEAD(&vpu_buf->list); > > > > + mutex_lock(&inst->feed_lock); > > > > + list_add_tail(&vpu_buf->list, &inst->avail_src_bufs); > > > > + mutex_unlock(&inst->feed_lock); > > > > } > > > > > > > > static void wave5_vpu_dec_buf_queue_dst(struct vb2_buffer *vb) @@ > > > > -1287,10 +1345,13 @@ static void wave5_vpu_dec_buf_queue(struct > > > vb2_buffer *vb) > > > > __func__, vb->type, vb->index, vb2_plane_size(&vbuf- > >vb2_buf, > > > 0), > > > > vb2_plane_size(&vbuf->vb2_buf, 1), > vb2_plane_size(&vbuf- > > > > vb2_buf, 2)); > > > > > > > > - if (vb->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) > > > > + if (vb->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) { > > > > + if (inst->empty_queue) > > > > + inst->empty_queue = false; > > > > wave5_vpu_dec_buf_queue_src(vb); > > > > - else if (vb->type == V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE) > > > > + } else if (vb->type == V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE) { > > > > wave5_vpu_dec_buf_queue_dst(vb); > > > > + } > > > > } > > > > > > > > static int wave5_vpu_dec_allocate_ring_buffer(struct vpu_instance > > > > *inst) @@ -1369,6 +1430,13 @@ static int streamoff_output(struct > > > vb2_queue *q) > > > > struct vb2_v4l2_buffer *buf; > > > > int ret; > > > > dma_addr_t new_rd_ptr; > > > > + struct vpu_src_buffer *vpu_buf, *tmp; > > > > + > > > > + inst->retry = false; > > > > + inst->queuing_num = 0; > > > > + > > > > + list_for_each_entry_safe(vpu_buf, tmp, &inst->avail_src_bufs, > list) > > > > + list_del_init(&vpu_buf->list); > > > > > > > > while ((buf = v4l2_m2m_src_buf_remove(m2m_ctx))) { > > > > dev_dbg(inst->dev->dev, "%s: (Multiplanar) buf > type %4u | > > > index > > > > %4u\n", @@ -1445,6 +1513,7 @@ static void > > > > wave5_vpu_dec_stop_streaming(struct vb2_queue *q) > > > > > > > > dev_dbg(inst->dev->dev, "%s: type: %u\n", __func__, q->type); > > > > pm_runtime_resume_and_get(inst->dev->dev); > > > > + inst->empty_queue = false; > > > > > > > > while (check_cmd) { > > > > struct queue_status_info q_status; @@ -1452,16 > +1521,13 @@ > > > > static void wave5_vpu_dec_stop_streaming(struct vb2_queue *q) > > > > > > > > wave5_vpu_dec_give_command(inst, DEC_GET_QUEUE_STATUS, > > > &q_status); > > > > > > > > - if (q_status.report_queue_count == 0) > > > > - break; > > > > - > > > > - if (wave5_vpu_wait_interrupt(inst, VPU_DEC_TIMEOUT) < > 0) > > > > + if ((inst->state == VPU_INST_STATE_STOP || > > > q_status.instance_queue_count == 0) && > > > > + q_status.report_queue_count == 0) > > > > break; > > > > > > > > if (wave5_vpu_dec_get_output_info(inst, > &dec_output_info)) > > > > dev_dbg(inst->dev->dev, "Getting decoding > results from > > > fw, fail\n"); > > > > } > > > > - > > > > v4l2_m2m_update_stop_streaming_state(m2m_ctx, q); > > > > > > > > if (q->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) > > > > @@ -1548,13 +1614,24 @@ static void wave5_vpu_dec_device_run(void > *priv) > > > > struct queue_status_info q_status; > > > > u32 fail_res = 0; > > > > int ret = 0; > > > > + unsigned long flags; > > > > > > > > dev_dbg(inst->dev->dev, "%s: Fill the ring buffer with new > > > bitstream data", __func__); > > > > pm_runtime_resume_and_get(inst->dev->dev); > > > > - ret = fill_ringbuffer(inst); > > > > - if (ret) { > > > > - dev_warn(inst->dev->dev, "Filling ring buffer > failed\n"); > > > > - goto finish_job_and_return; > > > > + if (!inst->retry) { > > > > + mutex_lock(&inst->feed_lock); > > > > + ret = fill_ringbuffer(inst); > > > > + mutex_unlock(&inst->feed_lock); > > > > + if (ret < 0) { > > > > + dev_warn(inst->dev->dev, "Filling ring buffer > > > failed\n"); > > > > + goto finish_job_and_return; > > > > + } else if (!inst->eos && > > > > + inst->queuing_num == 0 && > > > > + inst->state == VPU_INST_STATE_PIC_RUN) { > > > > + dev_dbg(inst->dev->dev, "%s: no bitstream for > feeding, > > > so skip ", __func__); > > > > + inst->empty_queue = true; > > > > + goto finish_job_and_return; > > > > + } > > > > } > > > > > > > > switch (inst->state) { > > > > @@ -1590,7 +1667,9 @@ static void wave5_vpu_dec_device_run(void > *priv) > > > > * we had a chance to switch, which leads to an > invalid state > > > > * change. > > > > */ > > > > + spin_lock_irqsave(&inst->state_spinlock, flags); > > > > switch_state(inst, VPU_INST_STATE_PIC_RUN); > > > > + spin_unlock_irqrestore(&inst->state_spinlock, flags); > > > > > > > > /* > > > > * During DRC, the picture decoding remains pending, > so just > > > leave > > > > the job @@ -1605,12 +1684,14 @@ static void > > > wave5_vpu_dec_device_run(void *priv) > > > > ret = wave5_prepare_fb(inst); > > > > if (ret) { > > > > dev_warn(inst->dev->dev, "Framebuffer > preparation, > > > fail: %d\n", > > > > ret); > > > > + spin_lock_irqsave(&inst->state_spinlock, flags); > > > > switch_state(inst, VPU_INST_STATE_STOP); > > > > + spin_unlock_irqrestore(&inst->state_spinlock, > flags); > > > > break; > > > > } > > > > > > > > if (q_status.instance_queue_count) { > > > > - dev_dbg(inst->dev->dev, "%s: leave with active > job", > > > __func__); > > > > + v4l2_m2m_job_finish(inst->v4l2_m2m_dev, > m2m_ctx); > > > > return; > > > > } > > > > > > > > @@ -1621,21 +1702,27 @@ static void wave5_vpu_dec_device_run(void > *priv) > > > > dev_err(inst->dev->dev, > > > > "Frame decoding on m2m context (%p), > fail: %d > > > (result: %d)\n", > > > > m2m_ctx, ret, fail_res); > > > > - break; > > > > + goto finish_job_and_return; > > > > } > > > > /* Return so that we leave this job active */ > > > > - dev_dbg(inst->dev->dev, "%s: leave with active job", > > > __func__); > > > > - return; > > > > - default: > > > > - WARN(1, "Execution of a job in state %s illegal.\n", > > > state_to_str(inst->state)); > > > > + if (fail_res == WAVE5_SYSERR_QUEUEING_FAIL) { > > > > + inst->retry = true; > > > > + } else { > > > > + inst->retry = false; > > > > + if (!inst->eos) > > > > + inst->queuing_num--; > > > > + } > > > > break; > > > > + default: > > > > + if (!v4l2_m2m_has_stopped(m2m_ctx)) > > > > + WARN(1, "Execution of a job in state %s > illegal.\n", > > > > + state_to_str(inst->state)); > > > > + return; > > > > } > > > > > > > > finish_job_and_return: > > > > + up(&inst->run_sem); > > > > dev_dbg(inst->dev->dev, "%s: leave and finish job", > __func__); > > > > - pm_runtime_mark_last_busy(inst->dev->dev); > > > > - pm_runtime_put_autosuspend(inst->dev->dev); > > > > - v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx); > > > > } > > > > > > > > static void wave5_vpu_dec_job_abort(void *priv) @@ -1647,10 > > > > +1734,7 @@ static void wave5_vpu_dec_job_abort(void *priv) > > > > if (ret) > > > > return; > > > > > > > > - ret = wave5_vpu_dec_set_eos_on_firmware(inst); > > > > - if (ret) > > > > - dev_warn(inst->dev->dev, > > > > - "Setting EOS for the bitstream, fail: %d\n", > ret); > > > > + v4l2_m2m_job_finish(inst->v4l2_m2m_dev, inst- > >v4l2_fh.m2m_ctx); > > > > } > > > > > > > > static int wave5_vpu_dec_job_ready(void *priv) @@ -1686,7 +1770,8 > > > > @@ static int wave5_vpu_dec_job_ready(void *priv) > > > > "No capture buffer ready to decode!\n"); > > > > break; > > > > } else if (!wave5_is_draining_or_eos(inst) && > > > > - !v4l2_m2m_num_src_bufs_ready(m2m_ctx)) { > > > > + (!v4l2_m2m_num_src_bufs_ready(m2m_ctx) || > > > > + inst->empty_queue)) { > > > > dev_dbg(inst->dev->dev, > > > > "No bitstream data to decode!\n"); > > > > break; > > > > @@ -1726,6 +1811,8 @@ static int wave5_vpu_open_dec(struct file > *filp) > > > > inst->ops = &wave5_vpu_dec_inst_ops; > > > > > > > > spin_lock_init(&inst->state_spinlock); > > > > + mutex_init(&inst->feed_lock); > > > > + INIT_LIST_HEAD(&inst->avail_src_bufs); > > > > > > > > inst->codec_info = kzalloc(sizeof(*inst->codec_info), > GFP_KERNEL); > > > > if (!inst->codec_info) > > > > @@ -1782,6 +1869,11 @@ static int wave5_vpu_open_dec(struct file > *filp) > > > > inst->xfer_func = V4L2_XFER_FUNC_DEFAULT; > > > > > > > > init_completion(&inst->irq_done); > > > > + ret = kfifo_alloc(&inst->irq_status, 16 * sizeof(int), > GFP_KERNEL); > > > > + if (ret) { > > > > + dev_err(inst->dev->dev, "failed to allocate fifo\n"); > > > > + goto cleanup_inst; > > > > + } > > > > > > > > inst->id = ida_alloc(&inst->dev->inst_ida, GFP_KERNEL); > > > > if (inst->id < 0) { > > > > @@ -1797,13 +1889,13 @@ static int wave5_vpu_open_dec(struct file > *filp) > > > > if (inst->dev->product_code != WAVE515_CODE) > > > > wave5_vdi_allocate_sram(inst->dev); > > > > > > > > + sema_init(&inst->run_sem, 1); > > > > + inst->run_thread = kthread_run(run_thread, inst, "run > thread"); > > > > + > > > > ret = mutex_lock_interruptible(&dev->dev_lock); > > > > if (ret) > > > > goto cleanup_inst; > > > > > > > > - if (list_empty(&dev->instances)) > > > > - pm_runtime_use_autosuspend(inst->dev->dev); > > > > - > > > > list_add_tail(&inst->list, &dev->instances); > > > > > > > > mutex_unlock(&dev->dev_lock); > > > > diff --git > > > > a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c > > > > b/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c > > > > index 1e5fc5f8b856..7f1aa392805f 100644 > > > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c > > > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c > > > > @@ -1760,6 +1760,11 @@ static int wave5_vpu_open_enc(struct file > *filp) > > > > inst->frame_rate = 30; > > > > > > > > init_completion(&inst->irq_done); > > > > + ret = kfifo_alloc(&inst->irq_status, 16 * sizeof(int), > GFP_KERNEL); > > > > + if (ret) { > > > > + dev_err(inst->dev->dev, "failed to allocate fifo\n"); > > > > + goto cleanup_inst; > > > > + } > > > > > > > > inst->id = ida_alloc(&inst->dev->inst_ida, GFP_KERNEL); > > > > if (inst->id < 0) { > > > > @@ -1774,9 +1779,6 @@ static int wave5_vpu_open_enc(struct file > *filp) > > > > if (ret) > > > > goto cleanup_inst; > > > > > > > > - if (list_empty(&dev->instances)) > > > > - pm_runtime_use_autosuspend(inst->dev->dev); > > > > - > > > > list_add_tail(&inst->list, &dev->instances); > > > > > > > > mutex_unlock(&dev->dev_lock); > > > > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu.c > > > > b/drivers/media/platform/chips-media/wave5/wave5-vpu.c > > > > index 63a607d10433..e1e4be482d36 100644 > > > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpu.c > > > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu.c > > > > @@ -51,17 +51,24 @@ static void wave5_vpu_handle_irq(void *dev_id) > > > > u32 seq_done; > > > > u32 cmd_done; > > > > u32 irq_reason; > > > > - struct vpu_instance *inst; > > > > + u32 irq_subreason; > > > > + struct vpu_instance *inst, *tmp; > > > > struct vpu_device *dev = dev_id; > > > > + int val; > > > > + unsigned long flags; > > > > > > > > irq_reason = wave5_vdi_read_register(dev, > W5_VPU_VINT_REASON); > > > > seq_done = wave5_vdi_read_register(dev, > > > W5_RET_SEQ_DONE_INSTANCE_INFO); > > > > cmd_done = wave5_vdi_read_register(dev, > > > > W5_RET_QUEUE_CMD_DONE_INST); > > > > + > > > > + if (dev->irq < 0 && cmd_done > 0) > > > > + wave5_vdi_write_register(dev, > W5_RET_QUEUE_CMD_DONE_INST, 0); > > > > + > > > > wave5_vdi_write_register(dev, W5_VPU_VINT_REASON_CLR, > irq_reason); > > > > wave5_vdi_write_register(dev, W5_VPU_VINT_CLEAR, 0x1); > > > > > > > > - list_for_each_entry(inst, &dev->instances, list) { > > > > - > > > > + spin_lock_irqsave(&dev->irq_spinlock, flags); > > > > + list_for_each_entry_safe(inst, tmp, &dev->instances, list) { > > > > if (irq_reason & BIT(INT_WAVE5_INIT_SEQ) || > > > > irq_reason & BIT(INT_WAVE5_ENC_SET_PARAM)) { > > > > if (dev->product_code == WAVE515_CODE && @@ - > 82,14 > > > +89,22 @@ > > > > static void wave5_vpu_handle_irq(void *dev_id) > > > > irq_reason & BIT(INT_WAVE5_ENC_PIC)) { > > > > if (cmd_done & BIT(inst->id)) { > > > > cmd_done &= ~BIT(inst->id); > > > > - wave5_vdi_write_register(dev, > > > W5_RET_QUEUE_CMD_DONE_INST, > > > > - cmd_done); > > > > - inst->ops->finish_process(inst); > > > > + if (dev->irq >= 0) { > > > > + irq_subreason = > > > > + wave5_vdi_read_register(dev, > > > W5_VPU_VINT_REASON); > > > > + if (!(irq_subreason & > > > BIT(INT_WAVE5_DEC_PIC))) > > > > + > wave5_vdi_write_register(dev, > > > > + > > > W5_RET_QUEUE_CMD_DONE_INST, > > > > + > cmd_done); > > > > + } > > > > + val = BIT(INT_WAVE5_DEC_PIC); > > > > + kfifo_in(&inst->irq_status, &val, > sizeof(int)); > > > > } > > > > } > > > > - > > > > - wave5_vpu_clear_interrupt(inst, irq_reason); > > > > } > > > > + spin_unlock_irqrestore(&dev->irq_spinlock, flags); > > > > + > > > > + up(&dev->irq_sem); > > > > } > > > > > > > > static irqreturn_t wave5_vpu_irq_thread(int irq, void *dev_id) @@ > > > > -121,6 +136,35 @@ static enum hrtimer_restart > > > wave5_vpu_timer_callback(struct hrtimer *timer) > > > > return HRTIMER_RESTART; > > > > } > > > > > > > > +static int irq_thread(void *data) { > > > > + struct vpu_device *dev = (struct vpu_device *)data; > > > > + struct vpu_instance *inst, *tmp; > > > > + int irq_status, ret; > > > > + > > > > + while (!kthread_should_stop()) { > > > > + if (down_interruptible(&dev->irq_sem)) > > > > + continue; > > > > + > > > > + if (kthread_should_stop()) > > > > + break; > > > > + > > > > + mutex_lock(&dev->irq_lock); > > > > + list_for_each_entry_safe(inst, tmp, &dev->instances, > list) { > > > > + while (kfifo_len(&inst->irq_status)) { > > > > + ret = kfifo_out(&inst->irq_status, > &irq_status, > > > sizeof(int)); > > > > + if (!ret) > > > > + break; > > > > + > > > > + inst->ops->finish_process(inst); > > > > + } > > > > + } > > > > + mutex_unlock(&dev->irq_lock); > > > > + } > > > > + > > > > + return 0; > > > > +} > > > > + > > > > static int wave5_vpu_load_firmware(struct device *dev, const char > > > *fw_name, > > > > u32 *revision) > > > > { > > > > @@ -224,6 +268,8 @@ static int wave5_vpu_probe(struct > > > > platform_device > > > > *pdev) > > > > > > > > mutex_init(&dev->dev_lock); > > > > mutex_init(&dev->hw_lock); > > > > + mutex_init(&dev->irq_lock); > > > > + spin_lock_init(&dev->irq_spinlock); > > > > dev_set_drvdata(&pdev->dev, dev); > > > > dev->dev = &pdev->dev; > > > > > > > > @@ -266,6 +312,9 @@ static int wave5_vpu_probe(struct > > > > platform_device > > > *pdev) > > > > } > > > > dev->product = wave5_vpu_get_product_id(dev); > > > > > > > > + sema_init(&dev->irq_sem, 1); > > > > + INIT_LIST_HEAD(&dev->instances); > > > > + dev->irq_thread = kthread_run(irq_thread, dev, "irq thread"); > > > > dev->irq = platform_get_irq(pdev, 0); > > > > if (dev->irq < 0) { > > > > dev_err(&pdev->dev, "failed to get irq resource, > falling > > > back to > > > > polling\n"); @@ -288,7 +337,6 @@ static int wave5_vpu_probe(struct > > > platform_device *pdev) > > > > } > > > > } > > > > > > > > - INIT_LIST_HEAD(&dev->instances); > > > > ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev); > > > > if (ret) { > > > > dev_err(&pdev->dev, "v4l2_device_register, fail: %d\n", > ret); > > > @@ > > > > -351,6 +399,12 @@ static void wave5_vpu_remove(struct > > > > platform_device > > > > *pdev) { > > > > struct vpu_device *dev = dev_get_drvdata(&pdev->dev); > > > > > > > > + if (dev->irq_thread) { > > > > + kthread_stop(dev->irq_thread); > > > > + up(&dev->irq_sem); > > > > + dev->irq_thread = NULL; > > > > + } > > > > + > > > > if (dev->irq < 0) { > > > > kthread_destroy_worker(dev->worker); > > > > hrtimer_cancel(&dev->hrtimer); > > > > @@ -361,6 +415,7 @@ static void wave5_vpu_remove(struct > > > > platform_device *pdev) > > > > > > > > mutex_destroy(&dev->dev_lock); > > > > mutex_destroy(&dev->hw_lock); > > > > + mutex_destroy(&dev->irq_lock); > > > > reset_control_assert(dev->resets); > > > > clk_bulk_disable_unprepare(dev->num_clks, dev->clks); > > > > wave5_vpu_enc_unregister_device(dev); > > > > diff --git > > > > a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c > > > > b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c > > > > index e16b990041c2..9a28d0f770bc 100644 > > > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c > > > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c > > > > @@ -197,8 +197,6 @@ int wave5_vpu_dec_close(struct vpu_instance > > > > *inst, > > > u32 *fail_res) > > > > int retry = 0; > > > > struct vpu_device *vpu_dev = inst->dev; > > > > int i; > > > > - int inst_count = 0; > > > > - struct vpu_instance *inst_elm; > > > > > > > > *fail_res = 0; > > > > if (!inst->codec_info) > > > > @@ -223,6 +221,16 @@ int wave5_vpu_dec_close(struct vpu_instance > > > > *inst, > > > u32 *fail_res) > > > > retry++ >= MAX_FIRMWARE_CALL_RETRY) { > > > > ret = -ETIMEDOUT; > > > > goto unlock_and_return; > > > > + } else if (*fail_res == WAVE5_SYSERR_VPU_STILL_RUNNING) > { > > > > + struct dec_output_info dec_info; > > > > + > > > > + mutex_unlock(&vpu_dev->hw_lock); > > > > + wave5_vpu_dec_get_output_info(inst, &dec_info); > > > > + ret = mutex_lock_interruptible(&vpu_dev- > >hw_lock); > > > > + if (ret) { > > > > + pm_runtime_put_sync(inst->dev->dev); > > > > + return ret; > > > > + } > > > > } > > > > } while (ret != 0); > > > > > > > > @@ -240,11 +248,6 @@ int wave5_vpu_dec_close(struct vpu_instance > > > > *inst, u32 *fail_res) > > > > > > > > wave5_vdi_free_dma_memory(vpu_dev, &p_dec_info->vb_task); > > > > > > > > - list_for_each_entry(inst_elm, &vpu_dev->instances, list) > > > > - inst_count++; > > > > - if (inst_count == 1) > > > > - pm_runtime_dont_use_autosuspend(vpu_dev->dev); > > > > - > > > > unlock_and_return: > > > > mutex_unlock(&vpu_dev->hw_lock); > > > > pm_runtime_put_sync(inst->dev->dev); > > > > @@ -710,8 +713,6 @@ int wave5_vpu_enc_close(struct vpu_instance > > > > *inst, > > > u32 *fail_res) > > > > int ret; > > > > int retry = 0; > > > > struct vpu_device *vpu_dev = inst->dev; > > > > - int inst_count = 0; > > > > - struct vpu_instance *inst_elm; > > > > > > > > *fail_res = 0; > > > > if (!inst->codec_info) > > > > @@ -755,11 +756,6 @@ int wave5_vpu_enc_close(struct vpu_instance > > > > *inst, u32 *fail_res) > > > > > > > > wave5_vdi_free_dma_memory(vpu_dev, &p_enc_info->vb_task); > > > > > > > > - list_for_each_entry(inst_elm, &vpu_dev->instances, list) > > > > - inst_count++; > > > > - if (inst_count == 1) > > > > - pm_runtime_dont_use_autosuspend(vpu_dev->dev); > > > > - > > > > mutex_unlock(&vpu_dev->hw_lock); > > > > pm_runtime_put_sync(inst->dev->dev); > > > > > > > > diff --git > > > > a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h > > > > b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h > > > > index 45615c15beca..d26590141562 100644 > > > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h > > > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h > > > > @@ -8,6 +8,7 @@ > > > > #ifndef VPUAPI_H_INCLUDED > > > > #define VPUAPI_H_INCLUDED > > > > > > > > +#include <linux/kfifo.h> > > > > #include <linux/idr.h> > > > > #include <linux/genalloc.h> > > > > #include <media/v4l2-device.h> > > > > @@ -163,7 +164,7 @@ enum set_param_option { > > > > > > > > #define BUFFER_MARGIN 4096 > > > > > > > > -#define MAX_FIRMWARE_CALL_RETRY 10 > > > > +#define MAX_FIRMWARE_CALL_RETRY 30 > > > > > > > > #define VDI_LITTLE_ENDIAN 0x0 > > > > > > > > @@ -747,6 +748,7 @@ struct vpu_device { > > > > struct video_device *video_dev_enc; > > > > struct mutex dev_lock; /* lock for the src, dst v4l2 queues > */ > > > > struct mutex hw_lock; /* lock hw configurations */ > > > > + struct mutex irq_lock; > > > > int irq; > > > > enum product_id product; > > > > struct vpu_attr attr; > > > > @@ -764,7 +766,10 @@ struct vpu_device { > > > > struct kthread_worker *worker; > > > > int vpu_poll_interval; > > > > int num_clks; > > > > + struct task_struct *irq_thread; > > > > + struct semaphore irq_sem; > > > > struct reset_control *resets; > > > > + spinlock_t irq_spinlock; /* protect instances list */ > > > > }; > > > > > > > > struct vpu_instance; > > > > @@ -787,7 +792,7 @@ struct vpu_instance { > > > > enum v4l2_xfer_func xfer_func; > > > > enum v4l2_ycbcr_encoding ycbcr_enc; > > > > enum v4l2_quantization quantization; > > > > - > > > > + struct kfifo irq_status; > > > > enum vpu_instance_state state; > > > > enum vpu_instance_type type; > > > > const struct vpu_instance_ops *ops; @@ -812,11 +817,17 @@ > struct > > > > vpu_instance { > > > > bool cbcr_interleave; > > > > bool nv21; > > > > bool eos; > > > > + bool retry; > > > > + bool empty_queue; > > > > + int queuing_num; > > > > + struct mutex feed_lock; /* lock for feeding bitstream > buffers */ > > > > struct vpu_buf bitstream_vbuf; > > > > dma_addr_t last_rd_ptr; > > > > size_t remaining_consumed_bytes; > > > > bool needs_reallocation; > > > > > > > > + struct semaphore run_sem; > > > > + struct task_struct *run_thread; > > > > unsigned int min_src_buf_count; > > > > unsigned int rot_angle; > > > > unsigned int mirror_direction; > >