RE: [PATCH v1 3/5] media: chips-media: wave5: Improve performance of decoder

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Nicolas


> -----Original Message-----
> From: Nicolas Dufresne <nicolas.dufresne@xxxxxxxxxxxxx>
> Sent: Thursday, December 19, 2024 1:42 AM
> To: jackson.lee <jackson.lee@xxxxxxxxxxxxxxx>; mchehab@xxxxxxxxxx;
> hverkuil-cisco@xxxxxxxxx; sebastian.fricke@xxxxxxxxxxxxx;
> bob.beckett@xxxxxxxxxxxxx; dafna.hirschfeld@xxxxxxxxxxxxx
> Cc: linux-media@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; lafley.kim
> <lafley.kim@xxxxxxxxxxxxxxx>; b-brnich@xxxxxx; hverkuil@xxxxxxxxx; Nas
> Chung <nas.chung@xxxxxxxxxxxxxxx>
> Subject: Re: [PATCH v1 3/5] media: chips-media: wave5: Improve performance
> of decoder
> 
> Hi Jackson,
> 
> Le mercredi 18 décembre 2024 à 05:05 +0000, jackson.lee a écrit :
> > Hi Nicolas
> >
> > Yes, this patch is very complex in order to review.
> > So I will split this patch to three patch sets.
> >
> > 1. Main patch to improve performance of decoder.
> >     As you know, the previous code's flow was driver has to wait until
> getting a decoded frame after feeding a bitstream, then next bitstream was
> fed into a firmware.
> >     To archive goal, we needed to work asynchronously between output and
> capture queue.
> >     So we had to remove the v4l2_m2m_job_finish from
> wave5_vpu_dec_finish_decode and had to put it into the device_run to
> handle decoding.
> >
> > 2. Since applying changes for performance improvement of decoder, we had
> a problem related to high CPU load,
> >     CPU load was more than 4 times when comparing CPU load between the
> previous and current code.
> >     The root cause was the device_run was called many times even if there
> was no bitstream which should be queued.
> 
> This needs changes in job_ready() ops, which is what decide if the device
> must run or not. It seems difficult to do 2. separately from 1. though,
> but perhaps you can do that before 1.
> 

To reduce CPU Load, the job_ready() was modified.


Thanks



> >
> > 3. Fix Null reference while testing fluster with more than 2 decoders.
> >     When multi instances are created/destroyed, many interrupts happens
> or structures for decoder are removed.
> >     "struct vpu_instance" this structure is shared for all flow in
> decoder, so if the structure is not protected by lock, Null reference
> exception could happens sometimes.
> >     IRQ Handler was spilt to two phases and Lock was added as well.
> 
> This #3 is what introduced most of the refactoring, perhaps you want this
> to come first. In short, if you move code around, do that separately, then
> add the locking in preparation for the newly introduce concurrency from 1.
> 
> Thanks for this information, it will certainly help in the next review.
> 
> regards,
> Nicolas
> 
> >
> >
> > > -----Original Message-----
> > > From: Nicolas Dufresne <nicolas.dufresne@xxxxxxxxxxxxx>
> > > Sent: Saturday, December 14, 2024 4:48 AM
> > > To: jackson.lee <jackson.lee@xxxxxxxxxxxxxxx>; mchehab@xxxxxxxxxx;
> > > hverkuil-cisco@xxxxxxxxx; sebastian.fricke@xxxxxxxxxxxxx;
> > > bob.beckett@xxxxxxxxxxxxx; dafna.hirschfeld@xxxxxxxxxxxxx
> > > Cc: linux-media@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> > > lafley.kim <lafley.kim@xxxxxxxxxxxxxxx>; b-brnich@xxxxxx;
> > > hverkuil@xxxxxxxxx; Nas Chung <nas.chung@xxxxxxxxxxxxxxx>
> > > Subject: Re: [PATCH v1 3/5] media: chips-media: wave5: Improve
> > > performance of decoder
> > >
> > > Le lundi 09 décembre 2024 à 14:36 +0900, Jackson.lee a écrit :
> > > > The existing way for decoding frames was to wait until each frame
> > > > was decoded after feeding a bitstream. As a result, performance
> > > > was low and Wave5 could not achieve max pixel processing rate.
> > > >
> > > > Update driver to use an asynchronous approach for decoding and
> > > > feeding a bitstream in order to achieve full capabilities of the
> device.
> > >
> > > That patch is just massive and difficult to review. It does
> > > important change to the driver design, which require important
> > > threading review. I'm not sure I can provide a good review without
> > > this change being better introduced and broken down. Few things:
> > >
> > > - Try and explain the threading changes in the commits, and perhaps
> > >   introduce the change in the cover letter
> >
> >
> > > - Split the refactoring of the ring buffer handling in separate non-
> > >   functional change patches
> >
> > I didn't modify code to handle the ring buffer, its line was just moved
> to prevent compiling error.
> >
> >
> > > - Change the job_ready() logic and use the m2m thread instead of
> creating
> > >   another one and wasting context switches
> >
> >
> > I will do that.
> >
> >
> > Thanks
> > Jackson
> >
> >
> > >
> > > What needs a good theoretical review, is the handling of "abort()"
> flow.
> > > We had a big struggle with that while helping with the up-streaming.
> > > That is more or less why for time constraints reason we went for
> > > this sub- optimal method of idling the HW between jobs. Don't read
> > > me wrong, I'm very happy so see work being done on that.
> > >
> > > In parallel, do you agree that 1/5 and 2/5 could be applied already
> > > ? If so, can you send these two in their own serie, and please, add
> > > the missing
> > > Fixes: tag, which are needed for backport purpose.
> > >
> > > regards,
> > > Nicolas
> > >
> > > >
> > > > Signed-off-by: Jackson.lee <jackson.lee@xxxxxxxxxxxxxxx>
> > > > Signed-off-by: Nas Chung <nas.chung@xxxxxxxxxxxxxxx>
> > > > ---
> > > >  .../platform/chips-media/wave5/wave5-helper.c |  20 +-
> > > >  .../chips-media/wave5/wave5-vpu-dec.c         | 400 +++++++++++------
> -
> > > >  .../chips-media/wave5/wave5-vpu-enc.c         |   8 +-
> > > >  .../platform/chips-media/wave5/wave5-vpu.c    |  73 +++-
> > > >  .../platform/chips-media/wave5/wave5-vpuapi.c |  24 +-
> > > > .../platform/chips-media/wave5/wave5-vpuapi.h |  15 +-
> > > >  6 files changed, 355 insertions(+), 185 deletions(-)
> > > >
> > > > diff --git
> > > > a/drivers/media/platform/chips-media/wave5/wave5-helper.c
> > > > b/drivers/media/platform/chips-media/wave5/wave5-helper.c
> > > > index 2c9d8cbca6e4..83f30ad36e4f 100644
> > > > --- a/drivers/media/platform/chips-media/wave5/wave5-helper.c
> > > > +++ b/drivers/media/platform/chips-media/wave5/wave5-helper.c
> > > > @@ -30,7 +30,6 @@ const char *state_to_str(enum vpu_instance_state
> > > > state)  void wave5_cleanup_instance(struct vpu_instance *inst)  {
> > > >  	int i;
> > > > -
> > >
> > > nit: I kind of liked that blank line, but its yours to decide what
> > > you like the best.
> > >
> > > >  	/*
> > > >  	 * For Wave515 SRAM memory is allocated at
> > > >  	 * wave5_vpu_dec_register_device() and freed at @@ -49,7
> +48,9
> > > > @@ void wave5_cleanup_instance(struct vpu_instance *inst)
> > > >  		v4l2_fh_del(&inst->v4l2_fh);
> > > >  		v4l2_fh_exit(&inst->v4l2_fh);
> > > >  	}
> > > > -	list_del_init(&inst->list);
> > > > +
> > > > +	kfifo_free(&inst->irq_status);
> > > > +	mutex_destroy(&inst->feed_lock);
> > > >  	ida_free(&inst->dev->inst_ida, inst->id);
> > > >  	kfree(inst->codec_info);
> > > >  	kfree(inst);
> > > > @@ -61,8 +62,22 @@ int wave5_vpu_release_device(struct file *filp,
> {
> > > >  	struct vpu_instance *inst = wave5_to_vpu_inst(filp-
> >private_data);
> > > >  	int ret = 0;
> > > > +	unsigned long flags;
> > > > +
> > > > +	if (inst->run_thread) {
> > > > +		kthread_stop(inst->run_thread);
> > > > +		up(&inst->run_sem);
> > > > +		inst->run_thread = NULL;
> > > > +	}
> > > >
> > > >  	v4l2_m2m_ctx_release(inst->v4l2_fh.m2m_ctx);
> > > > +	ret = mutex_lock_interruptible(&inst->dev->irq_lock);
> > > > +	if (ret)
> > > > +		return ret;
> > > > +	spin_lock_irqsave(&inst->dev->irq_spinlock, flags);
> > > > +	list_del_init(&inst->list);
> > > > +	spin_unlock_irqrestore(&inst->dev->irq_spinlock, flags);
> > > > +	mutex_unlock(&inst->dev->irq_lock);
> > > >  	if (inst->state != VPU_INST_STATE_NONE) {
> > > >  		u32 fail_res;
> > > >
> > > > @@ -79,7 +94,6 @@ int wave5_vpu_release_device(struct file *filp,
> > > >  	}
> > > >
> > > >  	wave5_cleanup_instance(inst);
> > > > -
> > > >  	return ret;
> > > >  }
> > > >
> > > > diff --git
> > > > a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > > > b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > > > index d3ff420c52ce..b0e1f94bdb30 100644
> > > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > > > @@ -6,6 +6,9 @@
> > > >   */
> > > >
> > > >  #include <linux/pm_runtime.h>
> > > > +#include <linux/delay.h>
> > > > +#include <linux/timer.h>
> > > > +#include <linux/atomic.h>
> > > >  #include "wave5-helper.h"
> > > >
> > > >  #define VPU_DEC_DEV_NAME "C&M Wave5 VPU decoder"
> > > > @@ -101,6 +104,24 @@ static const struct vpu_format
> > > dec_fmt_list[FMT_TYPES][MAX_FMTS] = {
> > > >  	}
> > > >  };
> > > >
> > > > +static int run_thread(void *data) {
> > > > +	struct vpu_instance *inst = (struct vpu_instance *)data;
> > > > +	struct v4l2_m2m_ctx *m2m_ctx = inst->v4l2_fh.m2m_ctx;
> > > > +
> > > > +	while (!kthread_should_stop()) {
> > > > +		if (down_interruptible(&inst->run_sem))
> > > > +			continue;
> > > > +
> > > > +		if (kthread_should_stop())
> > > > +			break;
> > > > +
> > > > +		v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
> > > > +	}
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > >  /*
> > > >   * Make sure that the state switch is allowed and add logging for
> > > debugging
> > > >   * purposes
> > > > @@ -230,7 +251,6 @@ static int start_decode(struct vpu_instance
> > > > *inst,
> > > u32 *fail_res)
> > > >  		switch_state(inst, VPU_INST_STATE_STOP);
> > > >
> > > >  		dev_dbg(inst->dev->dev, "%s: pic run failed / finish
> job",
> > > __func__);
> > > > -		v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
> > > >  	}
> > > >
> > > >  	return ret;
> > > > @@ -347,7 +367,6 @@ static void wave5_vpu_dec_finish_decode(struct
> > > vpu_instance *inst)
> > > >  	struct vb2_v4l2_buffer *dec_buf = NULL;
> > > >  	struct vb2_v4l2_buffer *disp_buf = NULL;
> > > >  	struct vb2_queue *dst_vq = v4l2_m2m_get_dst_vq(m2m_ctx);
> > > > -	struct queue_status_info q_status;
> > > >
> > > >  	dev_dbg(inst->dev->dev, "%s: Fetch output info from
> firmware.",
> > > > __func__);
> > > >
> > > > @@ -360,11 +379,22 @@ static void
> > > > wave5_vpu_dec_finish_decode(struct
> > > > vpu_instance *inst)
> > > >
> > > >  	dev_dbg(inst->dev->dev, "%s: rd_ptr %pad wr_ptr %pad",
> __func__,
> > > &dec_info.rd_ptr,
> > > >  		&dec_info.wr_ptr);
> > > > -	wave5_handle_src_buffer(inst, dec_info.rd_ptr);
> > > >
> > > >  	dev_dbg(inst->dev->dev, "%s: dec_info dec_idx %i
> disp_idx %i",
> > > __func__,
> > > >  		dec_info.index_frame_decoded,
> dec_info.index_frame_display);
> > > >
> > > > +	if (inst->std == W_AVC_DEC &&
> > > > +	    dec_info.index_frame_decoded == DECODED_IDX_FLAG_SKIP &&
> > > > +	    dec_info.index_frame_display == DISPLAY_IDX_FLAG_NO_FB) {
> > > > +		struct vb2_v4l2_buffer *src_buf =
> > > v4l2_m2m_src_buf_remove(m2m_ctx);
> > > > +
> > > > +		if (src_buf)
> > > > +			v4l2_m2m_buf_done(src_buf, VB2_BUF_STATE_ERROR);
> > > > +		return;
> > > > +	}
> > > > +
> > > > +	wave5_handle_src_buffer(inst, dec_info.rd_ptr);
> > > > +
> > > >  	if (!vb2_is_streaming(dst_vq)) {
> > > >  		dev_dbg(inst->dev->dev, "%s: capture is not
> streaming..",
> > > __func__);
> > > >  		v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx); @@ -
> 441,20
> > > +471,6
> > > > @@ static void wave5_vpu_dec_finish_decode(struct vpu_instance *inst)
> > > >  		}
> > > >  		spin_unlock_irqrestore(&inst->state_spinlock, flags);
> > > >  	}
> > > > -
> > > > -	/*
> > > > -	 * During a resolution change and while draining, the
> firmware may
> > > flush
> > > > -	 * the reorder queue regardless of having a matching decoding
> > > operation
> > > > -	 * pending. Only terminate the job if there are no more IRQ
> coming.
> > > > -	 */
> > > > -	wave5_vpu_dec_give_command(inst, DEC_GET_QUEUE_STATUS,
> &q_status);
> > > > -	if (q_status.report_queue_count == 0 &&
> > > > -	    (q_status.instance_queue_count == 0 ||
> > > dec_info.sequence_changed)) {
> > > > -		dev_dbg(inst->dev->dev, "%s: finishing job.\n",
> __func__);
> > > > -		pm_runtime_mark_last_busy(inst->dev->dev);
> > > > -		pm_runtime_put_autosuspend(inst->dev->dev);
> > > > -		v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
> > > > -	}
> > > >  }
> > > >
> > > >  static int wave5_vpu_dec_querycap(struct file *file, void *fh,
> > > > struct v4l2_capability *cap) @@ -465,6 +481,142 @@ static int
> > > wave5_vpu_dec_querycap(struct file *file, void *fh, struct
> > > v4l2_capab
> > > >  	return 0;
> > > >  }
> > > >
> > > > +static int write_to_ringbuffer(struct vpu_instance *inst, void
> > > > +*buffer,
> > > size_t buffer_size,
> > > > +			       struct vpu_buf *ring_buffer, dma_addr_t
> wr_ptr)
> > > {
> > > > +	size_t size;
> > > > +	size_t offset = wr_ptr - ring_buffer->daddr;
> > > > +	int ret;
> > > > +
> > > > +	if (wr_ptr + buffer_size > ring_buffer->daddr +
> > > > +ring_buffer->size)
> > > {
> > > > +		size = ring_buffer->daddr + ring_buffer->size - wr_ptr;
> > > > +		ret = wave5_vdi_write_memory(inst->dev, ring_buffer,
> offset,
> > > (u8 *)buffer, size);
> > > > +		if (ret < 0)
> > > > +			return ret;
> > > > +
> > > > +		ret = wave5_vdi_write_memory(inst->dev, ring_buffer, 0,
> (u8
> > > *)buffer + size,
> > > > +					     buffer_size - size);
> > > > +		if (ret < 0)
> > > > +			return ret;
> > > > +	} else {
> > > > +		ret = wave5_vdi_write_memory(inst->dev, ring_buffer,
> offset,
> > > (u8 *)buffer,
> > > > +					     buffer_size);
> > > > +		if (ret < 0)
> > > > +			return ret;
> > > > +	}
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +static int fill_ringbuffer(struct vpu_instance *inst) {
> > > > +	struct v4l2_m2m_ctx *m2m_ctx = inst->v4l2_fh.m2m_ctx;
> > > > +	struct vpu_src_buffer *vpu_buf;
> > > > +	int ret = 0;
> > > > +
> > > > +	if (m2m_ctx->last_src_buf)  {
> > > > +		struct vpu_src_buffer *vpu_buf =
> > > > +wave5_to_vpu_src_buf(m2m_ctx->last_src_buf);
> > > > +
> > > > +		if (vpu_buf->consumed) {
> > > > +			dev_dbg(inst->dev->dev, "last src buffer
> already
> > > written\n");
> > > > +			return 0;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	list_for_each_entry(vpu_buf, &inst->avail_src_bufs, list) {
> > > > +		struct vb2_v4l2_buffer *vbuf = &vpu_buf-
> >v4l2_m2m_buf.vb;
> > > > +		struct vpu_buf *ring_buffer = &inst->bitstream_vbuf;
> > > > +		size_t src_size = vb2_get_plane_payload(&vbuf->vb2_buf,
> 0);
> > > > +		void *src_buf = vb2_plane_vaddr(&vbuf->vb2_buf, 0);
> > > > +		dma_addr_t rd_ptr = 0;
> > > > +		dma_addr_t wr_ptr = 0;
> > > > +		size_t remain_size = 0;
> > > > +
> > > > +		if (vpu_buf->consumed) {
> > > > +			dev_dbg(inst->dev->dev, "already copied src buf
> (%u)
> > > to the ring buffer\n",
> > > > +				vbuf->vb2_buf.index);
> > > > +			continue;
> > > > +		}
> > > > +
> > > > +		if (!src_buf) {
> > > > +			dev_dbg(inst->dev->dev,
> > > > +				"%s: Acquiring kernel pointer to src buf
> (%u),
> > > fail\n",
> > > > +				__func__, vbuf->vb2_buf.index);
> > > > +			break;
> > > > +		}
> > > > +
> > > > +		ret = wave5_vpu_dec_get_bitstream_buffer(inst, &rd_ptr,
> > > &wr_ptr, &remain_size);
> > > > +		if (ret) {
> > > > +			/* Unable to acquire the mutex */
> > > > +			dev_err(inst->dev->dev, "Getting the bitstream
> buffer,
> > > fail: %d\n",
> > > > +				ret);
> > > > +			return ret;
> > > > +		}
> > > > +
> > > > +		dev_dbg(inst->dev->dev, "%s: rd_ptr %pad wr_ptr %pad",
> > > __func__,
> > > > +&rd_ptr, &wr_ptr);
> > > > +
> > > > +		if (remain_size < src_size) {
> > > > +			dev_dbg(inst->dev->dev,
> > > > +				"%s: remaining size: %zu < source
> size: %zu for
> > > src buf (%u)\n",
> > > > +				__func__, remain_size, src_size, vbuf-
> > > > vb2_buf.index);
> > > > +			break;
> > > > +		}
> > > > +
> > > > +		ret = write_to_ringbuffer(inst, src_buf, src_size,
> > > ring_buffer, wr_ptr);
> > > > +		if (ret) {
> > > > +			dev_err(inst->dev->dev, "Write src buf (%u) to
> ring
> > > buffer, fail: %d\n",
> > > > +				vbuf->vb2_buf.index, ret);
> > > > +			return ret;
> > > > +		}
> > > > +
> > > > +		ret = wave5_vpu_dec_update_bitstream_buffer(inst,
> src_size);
> > > > +		if (ret) {
> > > > +			dev_dbg(inst->dev->dev,
> > > > +				"update_bitstream_buffer fail: %d for
> src buf
> > > (%u)\n",
> > > > +				ret, vbuf->vb2_buf.index);
> > > > +			break;
> > > > +		}
> > > > +
> > > > +		vpu_buf->consumed = true;
> > > > +
> > > > +		/* Don't write buffers passed the last one while
> draining.
> > > */
> > > > +		if (v4l2_m2m_is_last_draining_src_buf(m2m_ctx, vbuf))
> {
> > > > +			dev_dbg(inst->dev->dev, "last src buffer
> written to
> > > the ring buffer\n");
> > > > +			break;
> > > > +		}
> > > > +
> > > > +		inst->queuing_num++;
> > > > +		list_del_init(&vpu_buf->list);
> > > > +		break;
> > > > +	}
> > > > +
> > > > +	return ret;
> > > > +}
> > > > +
> > > > +static void wave5_vpu_dec_feed_remaining(struct vpu_instance *inst)
> {
> > > > +	int ret = 0;
> > > > +	struct v4l2_m2m_ctx *m2m_ctx = inst->v4l2_fh.m2m_ctx;
> > > > +	u32 fail_res = 0;
> > > > +
> > > > +	mutex_lock(&inst->feed_lock);
> > > > +	ret = fill_ringbuffer(inst);
> > > > +	mutex_unlock(&inst->feed_lock);
> > > > +	if (ret) {
> > > > +		dev_warn(inst->dev->dev, "Filling ring buffer
> failed\n");
> > > > +		return;
> > > > +	}
> > > > +
> > > > +	ret = start_decode(inst, &fail_res);
> > > > +	if (ret) {
> > > > +		dev_err(inst->dev->dev,
> > > > +			"Frame decoding on m2m context (%p), fail: %d
> > > (result: %d)\n",
> > > > +			m2m_ctx, ret, fail_res);
> > > > +	}
> > > > +
> > > > +	v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx); }
> > > > +
> > > >  static int wave5_vpu_dec_enum_framesizes(struct file *f, void
> > > > *fh, struct v4l2_frmsizeenum *fsize)  {
> > > >  	const struct vpu_format *vpu_fmt; @@ -794,11 +946,21 @@
> static
> > > > int wave5_vpu_dec_stop(struct vpu_instance
> > > *inst)
> > > >  	}
> > > >
> > > >  	if (inst->state != VPU_INST_STATE_NONE) {
> > > > +		struct vb2_v4l2_buffer *vbuf;
> > > > +		struct vpu_src_buffer *vpu_buf;
> > > > +
> > > >  		/*
> > > >  		 * Temporarily release the state_spinlock so that
> subsequent
> > > >  		 * calls do not block on a mutex while inside this
> spinlock.
> > > >  		 */
> > > >  		spin_unlock_irqrestore(&inst->state_spinlock, flags);
> > > > +		vbuf = v4l2_m2m_last_src_buf(m2m_ctx);
> > > > +		if (vbuf) {
> > > > +			vpu_buf = wave5_to_vpu_src_buf(vbuf);
> > > > +			if (!vpu_buf->consumed)
> > > > +				wave5_vpu_dec_feed_remaining(inst);
> > > > +		}
> > > > +
> > > >  		ret = wave5_vpu_dec_set_eos_on_firmware(inst);
> > > >  		if (ret)
> > > >  			return ret;
> > > > @@ -1116,115 +1278,6 @@ static int wave5_prepare_fb(struct
> > > > vpu_instance
> > > *inst)
> > > >  	return 0;
> > > >  }
> > > >
> > > > -static int write_to_ringbuffer(struct vpu_instance *inst, void
> > > > *buffer,
> > > size_t buffer_size,
> > > > -			       struct vpu_buf *ring_buffer, dma_addr_t
> wr_ptr)
> > > > -{
> > > > -	size_t size;
> > > > -	size_t offset = wr_ptr - ring_buffer->daddr;
> > > > -	int ret;
> > > > -
> > > > -	if (wr_ptr + buffer_size > ring_buffer->daddr + ring_buffer-
> >size)
> > > {
> > > > -		size = ring_buffer->daddr + ring_buffer->size - wr_ptr;
> > > > -		ret = wave5_vdi_write_memory(inst->dev, ring_buffer,
> offset,
> > > (u8 *)buffer, size);
> > > > -		if (ret < 0)
> > > > -			return ret;
> > > > -
> > > > -		ret = wave5_vdi_write_memory(inst->dev, ring_buffer, 0,
> (u8
> > > *)buffer + size,
> > > > -					     buffer_size - size);
> > > > -		if (ret < 0)
> > > > -			return ret;
> > > > -	} else {
> > > > -		ret = wave5_vdi_write_memory(inst->dev, ring_buffer,
> offset,
> > > (u8 *)buffer,
> > > > -					     buffer_size);
> > > > -		if (ret < 0)
> > > > -			return ret;
> > > > -	}
> > > > -
> > > > -	return 0;
> > > > -}
> > > > -
> > > > -static int fill_ringbuffer(struct vpu_instance *inst) -{
> > > > -	struct v4l2_m2m_ctx *m2m_ctx = inst->v4l2_fh.m2m_ctx;
> > > > -	struct v4l2_m2m_buffer *buf, *n;
> > > > -	int ret;
> > > > -
> > > > -	if (m2m_ctx->last_src_buf)  {
> > > > -		struct vpu_src_buffer *vpu_buf =
> > > wave5_to_vpu_src_buf(m2m_ctx->last_src_buf);
> > > > -
> > > > -		if (vpu_buf->consumed) {
> > > > -			dev_dbg(inst->dev->dev, "last src buffer
> already
> > > written\n");
> > > > -			return 0;
> > > > -		}
> > > > -	}
> > > > -
> > > > -	v4l2_m2m_for_each_src_buf_safe(m2m_ctx, buf, n) {
> > > > -		struct vb2_v4l2_buffer *vbuf = &buf->vb;
> > > > -		struct vpu_src_buffer *vpu_buf =
> wave5_to_vpu_src_buf(vbuf);
> > > > -		struct vpu_buf *ring_buffer = &inst->bitstream_vbuf;
> > > > -		size_t src_size = vb2_get_plane_payload(&vbuf->vb2_buf,
> 0);
> > > > -		void *src_buf = vb2_plane_vaddr(&vbuf->vb2_buf, 0);
> > > > -		dma_addr_t rd_ptr = 0;
> > > > -		dma_addr_t wr_ptr = 0;
> > > > -		size_t remain_size = 0;
> > > > -
> > > > -		if (vpu_buf->consumed) {
> > > > -			dev_dbg(inst->dev->dev, "already copied src buf
> (%u)
> > > to the ring buffer\n",
> > > > -				vbuf->vb2_buf.index);
> > > > -			continue;
> > > > -		}
> > > > -
> > > > -		if (!src_buf) {
> > > > -			dev_dbg(inst->dev->dev,
> > > > -				"%s: Acquiring kernel pointer to src buf
> (%u),
> > > fail\n",
> > > > -				__func__, vbuf->vb2_buf.index);
> > > > -			break;
> > > > -		}
> > > > -
> > > > -		ret = wave5_vpu_dec_get_bitstream_buffer(inst, &rd_ptr,
> > > &wr_ptr, &remain_size);
> > > > -		if (ret) {
> > > > -			/* Unable to acquire the mutex */
> > > > -			dev_err(inst->dev->dev, "Getting the bitstream
> buffer,
> > > fail: %d\n",
> > > > -				ret);
> > > > -			return ret;
> > > > -		}
> > > > -
> > > > -		dev_dbg(inst->dev->dev, "%s: rd_ptr %pad wr_ptr %pad",
> > > __func__, &rd_ptr, &wr_ptr);
> > > > -
> > > > -		if (remain_size < src_size) {
> > > > -			dev_dbg(inst->dev->dev,
> > > > -				"%s: remaining size: %zu < source
> size: %zu for
> > > src buf (%u)\n",
> > > > -				__func__, remain_size, src_size, vbuf-
> > > > vb2_buf.index);
> > > > -			break;
> > > > -		}
> > > > -
> > > > -		ret = write_to_ringbuffer(inst, src_buf, src_size,
> > > ring_buffer, wr_ptr);
> > > > -		if (ret) {
> > > > -			dev_err(inst->dev->dev, "Write src buf (%u) to
> ring
> > > buffer, fail: %d\n",
> > > > -				vbuf->vb2_buf.index, ret);
> > > > -			return ret;
> > > > -		}
> > > > -
> > > > -		ret = wave5_vpu_dec_update_bitstream_buffer(inst,
> src_size);
> > > > -		if (ret) {
> > > > -			dev_dbg(inst->dev->dev,
> > > > -				"update_bitstream_buffer fail: %d for
> src buf
> > > (%u)\n",
> > > > -				ret, vbuf->vb2_buf.index);
> > > > -			break;
> > > > -		}
> > > > -
> > > > -		vpu_buf->consumed = true;
> > > > -
> > > > -		/* Don't write buffers passed the last one while
> draining.
> > > */
> > > > -		if (v4l2_m2m_is_last_draining_src_buf(m2m_ctx, vbuf))
> {
> > > > -			dev_dbg(inst->dev->dev, "last src buffer
> written to
> > > the ring buffer\n");
> > > > -			break;
> > > > -		}
> > > > -	}
> > > > -
> > > > -	return 0;
> > > > -}
> > > > -
> > > >  static void wave5_vpu_dec_buf_queue_src(struct vb2_buffer *vb)  {
> > > >  	struct vpu_instance *inst = vb2_get_drv_priv(vb->vb2_queue);
> @@
> > > > -1236,6 +1289,11 @@ static void wave5_vpu_dec_buf_queue_src(struct
> > > vb2_buffer *vb)
> > > >  	vbuf->sequence = inst->queued_src_buf_num++;
> > > >
> > > >  	v4l2_m2m_buf_queue(m2m_ctx, vbuf);
> > > > +
> > > > +	INIT_LIST_HEAD(&vpu_buf->list);
> > > > +	mutex_lock(&inst->feed_lock);
> > > > +	list_add_tail(&vpu_buf->list, &inst->avail_src_bufs);
> > > > +	mutex_unlock(&inst->feed_lock);
> > > >  }
> > > >
> > > >  static void wave5_vpu_dec_buf_queue_dst(struct vb2_buffer *vb) @@
> > > > -1287,10 +1345,13 @@ static void wave5_vpu_dec_buf_queue(struct
> > > vb2_buffer *vb)
> > > >  		__func__, vb->type, vb->index, vb2_plane_size(&vbuf-
> >vb2_buf,
> > > 0),
> > > >  		vb2_plane_size(&vbuf->vb2_buf, 1),
> vb2_plane_size(&vbuf-
> > > > vb2_buf, 2));
> > > >
> > > > -	if (vb->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE)
> > > > +	if (vb->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) {
> > > > +		if (inst->empty_queue)
> > > > +			inst->empty_queue = false;
> > > >  		wave5_vpu_dec_buf_queue_src(vb);
> > > > -	else if (vb->type == V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE)
> > > > +	} else if (vb->type == V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE) {
> > > >  		wave5_vpu_dec_buf_queue_dst(vb);
> > > > +	}
> > > >  }
> > > >
> > > >  static int wave5_vpu_dec_allocate_ring_buffer(struct vpu_instance
> > > > *inst) @@ -1369,6 +1430,13 @@ static int streamoff_output(struct
> > > vb2_queue *q)
> > > >  	struct vb2_v4l2_buffer *buf;
> > > >  	int ret;
> > > >  	dma_addr_t new_rd_ptr;
> > > > +	struct vpu_src_buffer *vpu_buf, *tmp;
> > > > +
> > > > +	inst->retry = false;
> > > > +	inst->queuing_num = 0;
> > > > +
> > > > +	list_for_each_entry_safe(vpu_buf, tmp, &inst->avail_src_bufs,
> list)
> > > > +		list_del_init(&vpu_buf->list);
> > > >
> > > >  	while ((buf = v4l2_m2m_src_buf_remove(m2m_ctx))) {
> > > >  		dev_dbg(inst->dev->dev, "%s: (Multiplanar) buf
> type %4u |
> > > index
> > > > %4u\n", @@ -1445,6 +1513,7 @@ static void
> > > > wave5_vpu_dec_stop_streaming(struct vb2_queue *q)
> > > >
> > > >  	dev_dbg(inst->dev->dev, "%s: type: %u\n", __func__, q->type);
> > > >  	pm_runtime_resume_and_get(inst->dev->dev);
> > > > +	inst->empty_queue = false;
> > > >
> > > >  	while (check_cmd) {
> > > >  		struct queue_status_info q_status; @@ -1452,16
> +1521,13 @@
> > > > static void wave5_vpu_dec_stop_streaming(struct vb2_queue *q)
> > > >
> > > >  		wave5_vpu_dec_give_command(inst, DEC_GET_QUEUE_STATUS,
> > > &q_status);
> > > >
> > > > -		if (q_status.report_queue_count == 0)
> > > > -			break;
> > > > -
> > > > -		if (wave5_vpu_wait_interrupt(inst, VPU_DEC_TIMEOUT) <
> 0)
> > > > +		if ((inst->state == VPU_INST_STATE_STOP ||
> > > q_status.instance_queue_count == 0) &&
> > > > +		    q_status.report_queue_count == 0)
> > > >  			break;
> > > >
> > > >  		if (wave5_vpu_dec_get_output_info(inst,
> &dec_output_info))
> > > >  			dev_dbg(inst->dev->dev, "Getting decoding
> results from
> > > fw, fail\n");
> > > >  	}
> > > > -
> > > >  	v4l2_m2m_update_stop_streaming_state(m2m_ctx, q);
> > > >
> > > >  	if (q->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE)
> > > > @@ -1548,13 +1614,24 @@ static void wave5_vpu_dec_device_run(void
> *priv)
> > > >  	struct queue_status_info q_status;
> > > >  	u32 fail_res = 0;
> > > >  	int ret = 0;
> > > > +	unsigned long flags;
> > > >
> > > >  	dev_dbg(inst->dev->dev, "%s: Fill the ring buffer with new
> > > bitstream data", __func__);
> > > >  	pm_runtime_resume_and_get(inst->dev->dev);
> > > > -	ret = fill_ringbuffer(inst);
> > > > -	if (ret) {
> > > > -		dev_warn(inst->dev->dev, "Filling ring buffer
> failed\n");
> > > > -		goto finish_job_and_return;
> > > > +	if (!inst->retry) {
> > > > +		mutex_lock(&inst->feed_lock);
> > > > +		ret = fill_ringbuffer(inst);
> > > > +		mutex_unlock(&inst->feed_lock);
> > > > +		if (ret < 0) {
> > > > +			dev_warn(inst->dev->dev, "Filling ring buffer
> > > failed\n");
> > > > +			goto finish_job_and_return;
> > > > +		} else if (!inst->eos &&
> > > > +			   inst->queuing_num == 0 &&
> > > > +			   inst->state == VPU_INST_STATE_PIC_RUN) {
> > > > +			dev_dbg(inst->dev->dev, "%s: no bitstream for
> feeding,
> > > so skip ", __func__);
> > > > +			inst->empty_queue = true;
> > > > +			goto finish_job_and_return;
> > > > +		}
> > > >  	}
> > > >
> > > >  	switch (inst->state) {
> > > > @@ -1590,7 +1667,9 @@ static void wave5_vpu_dec_device_run(void
> *priv)
> > > >  		 * we had a chance to switch, which leads to an
> invalid state
> > > >  		 * change.
> > > >  		 */
> > > > +		spin_lock_irqsave(&inst->state_spinlock, flags);
> > > >  		switch_state(inst, VPU_INST_STATE_PIC_RUN);
> > > > +		spin_unlock_irqrestore(&inst->state_spinlock, flags);
> > > >
> > > >  		/*
> > > >  		 * During DRC, the picture decoding remains pending,
> so just
> > > leave
> > > > the job @@ -1605,12 +1684,14 @@ static void
> > > wave5_vpu_dec_device_run(void *priv)
> > > >  		ret = wave5_prepare_fb(inst);
> > > >  		if (ret) {
> > > >  			dev_warn(inst->dev->dev, "Framebuffer
> preparation,
> > > fail: %d\n",
> > > > ret);
> > > > +			spin_lock_irqsave(&inst->state_spinlock, flags);
> > > >  			switch_state(inst, VPU_INST_STATE_STOP);
> > > > +			spin_unlock_irqrestore(&inst->state_spinlock,
> flags);
> > > >  			break;
> > > >  		}
> > > >
> > > >  		if (q_status.instance_queue_count) {
> > > > -			dev_dbg(inst->dev->dev, "%s: leave with active
> job",
> > > __func__);
> > > > +			v4l2_m2m_job_finish(inst->v4l2_m2m_dev,
> m2m_ctx);
> > > >  			return;
> > > >  		}
> > > >
> > > > @@ -1621,21 +1702,27 @@ static void wave5_vpu_dec_device_run(void
> *priv)
> > > >  			dev_err(inst->dev->dev,
> > > >  				"Frame decoding on m2m context (%p),
> fail: %d
> > > (result: %d)\n",
> > > >  				m2m_ctx, ret, fail_res);
> > > > -			break;
> > > > +			goto finish_job_and_return;
> > > >  		}
> > > >  		/* Return so that we leave this job active */
> > > > -		dev_dbg(inst->dev->dev, "%s: leave with active job",
> > > __func__);
> > > > -		return;
> > > > -	default:
> > > > -		WARN(1, "Execution of a job in state %s illegal.\n",
> > > state_to_str(inst->state));
> > > > +		if (fail_res == WAVE5_SYSERR_QUEUEING_FAIL) {
> > > > +			inst->retry = true;
> > > > +		} else {
> > > > +			inst->retry = false;
> > > > +			if (!inst->eos)
> > > > +				inst->queuing_num--;
> > > > +		}
> > > >  		break;
> > > > +	default:
> > > > +		if (!v4l2_m2m_has_stopped(m2m_ctx))
> > > > +			WARN(1, "Execution of a job in state %s
> illegal.\n",
> > > > +			     state_to_str(inst->state));
> > > > +		return;
> > > >  	}
> > > >
> > > >  finish_job_and_return:
> > > > +	up(&inst->run_sem);
> > > >  	dev_dbg(inst->dev->dev, "%s: leave and finish job",
> __func__);
> > > > -	pm_runtime_mark_last_busy(inst->dev->dev);
> > > > -	pm_runtime_put_autosuspend(inst->dev->dev);
> > > > -	v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
> > > >  }
> > > >
> > > >  static void wave5_vpu_dec_job_abort(void *priv) @@ -1647,10
> > > > +1734,7 @@ static void wave5_vpu_dec_job_abort(void *priv)
> > > >  	if (ret)
> > > >  		return;
> > > >
> > > > -	ret = wave5_vpu_dec_set_eos_on_firmware(inst);
> > > > -	if (ret)
> > > > -		dev_warn(inst->dev->dev,
> > > > -			 "Setting EOS for the bitstream, fail: %d\n",
> ret);
> > > > +	v4l2_m2m_job_finish(inst->v4l2_m2m_dev, inst-
> >v4l2_fh.m2m_ctx);
> > > >  }
> > > >
> > > >  static int wave5_vpu_dec_job_ready(void *priv) @@ -1686,7 +1770,8
> > > > @@ static int wave5_vpu_dec_job_ready(void *priv)
> > > >  				"No capture buffer ready to decode!\n");
> > > >  			break;
> > > >  		} else if (!wave5_is_draining_or_eos(inst) &&
> > > > -			   !v4l2_m2m_num_src_bufs_ready(m2m_ctx)) {
> > > > +			   (!v4l2_m2m_num_src_bufs_ready(m2m_ctx) ||
> > > > +			    inst->empty_queue)) {
> > > >  			dev_dbg(inst->dev->dev,
> > > >  				"No bitstream data to decode!\n");
> > > >  			break;
> > > > @@ -1726,6 +1811,8 @@ static int wave5_vpu_open_dec(struct file
> *filp)
> > > >  	inst->ops = &wave5_vpu_dec_inst_ops;
> > > >
> > > >  	spin_lock_init(&inst->state_spinlock);
> > > > +	mutex_init(&inst->feed_lock);
> > > > +	INIT_LIST_HEAD(&inst->avail_src_bufs);
> > > >
> > > >  	inst->codec_info = kzalloc(sizeof(*inst->codec_info),
> GFP_KERNEL);
> > > >  	if (!inst->codec_info)
> > > > @@ -1782,6 +1869,11 @@ static int wave5_vpu_open_dec(struct file
> *filp)
> > > >  	inst->xfer_func = V4L2_XFER_FUNC_DEFAULT;
> > > >
> > > >  	init_completion(&inst->irq_done);
> > > > +	ret = kfifo_alloc(&inst->irq_status, 16 * sizeof(int),
> GFP_KERNEL);
> > > > +	if (ret) {
> > > > +		dev_err(inst->dev->dev, "failed to allocate fifo\n");
> > > > +		goto cleanup_inst;
> > > > +	}
> > > >
> > > >  	inst->id = ida_alloc(&inst->dev->inst_ida, GFP_KERNEL);
> > > >  	if (inst->id < 0) {
> > > > @@ -1797,13 +1889,13 @@ static int wave5_vpu_open_dec(struct file
> *filp)
> > > >  	if (inst->dev->product_code != WAVE515_CODE)
> > > >  		wave5_vdi_allocate_sram(inst->dev);
> > > >
> > > > +	sema_init(&inst->run_sem, 1);
> > > > +	inst->run_thread = kthread_run(run_thread, inst, "run
> thread");
> > > > +
> > > >  	ret = mutex_lock_interruptible(&dev->dev_lock);
> > > >  	if (ret)
> > > >  		goto cleanup_inst;
> > > >
> > > > -	if (list_empty(&dev->instances))
> > > > -		pm_runtime_use_autosuspend(inst->dev->dev);
> > > > -
> > > >  	list_add_tail(&inst->list, &dev->instances);
> > > >
> > > >  	mutex_unlock(&dev->dev_lock);
> > > > diff --git
> > > > a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> > > > b/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> > > > index 1e5fc5f8b856..7f1aa392805f 100644
> > > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> > > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> > > > @@ -1760,6 +1760,11 @@ static int wave5_vpu_open_enc(struct file
> *filp)
> > > >  	inst->frame_rate = 30;
> > > >
> > > >  	init_completion(&inst->irq_done);
> > > > +	ret = kfifo_alloc(&inst->irq_status, 16 * sizeof(int),
> GFP_KERNEL);
> > > > +	if (ret) {
> > > > +		dev_err(inst->dev->dev, "failed to allocate fifo\n");
> > > > +		goto cleanup_inst;
> > > > +	}
> > > >
> > > >  	inst->id = ida_alloc(&inst->dev->inst_ida, GFP_KERNEL);
> > > >  	if (inst->id < 0) {
> > > > @@ -1774,9 +1779,6 @@ static int wave5_vpu_open_enc(struct file
> *filp)
> > > >  	if (ret)
> > > >  		goto cleanup_inst;
> > > >
> > > > -	if (list_empty(&dev->instances))
> > > > -		pm_runtime_use_autosuspend(inst->dev->dev);
> > > > -
> > > >  	list_add_tail(&inst->list, &dev->instances);
> > > >
> > > >  	mutex_unlock(&dev->dev_lock);
> > > > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> > > > b/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> > > > index 63a607d10433..e1e4be482d36 100644
> > > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> > > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> > > > @@ -51,17 +51,24 @@ static void wave5_vpu_handle_irq(void *dev_id)
> > > >  	u32 seq_done;
> > > >  	u32 cmd_done;
> > > >  	u32 irq_reason;
> > > > -	struct vpu_instance *inst;
> > > > +	u32 irq_subreason;
> > > > +	struct vpu_instance *inst, *tmp;
> > > >  	struct vpu_device *dev = dev_id;
> > > > +	int val;
> > > > +	unsigned long flags;
> > > >
> > > >  	irq_reason = wave5_vdi_read_register(dev,
> W5_VPU_VINT_REASON);
> > > >  	seq_done = wave5_vdi_read_register(dev,
> > > W5_RET_SEQ_DONE_INSTANCE_INFO);
> > > >  	cmd_done = wave5_vdi_read_register(dev,
> > > > W5_RET_QUEUE_CMD_DONE_INST);
> > > > +
> > > > +	if (dev->irq < 0 && cmd_done > 0)
> > > > +		wave5_vdi_write_register(dev,
> W5_RET_QUEUE_CMD_DONE_INST, 0);
> > > > +
> > > >  	wave5_vdi_write_register(dev, W5_VPU_VINT_REASON_CLR,
> irq_reason);
> > > >  	wave5_vdi_write_register(dev, W5_VPU_VINT_CLEAR, 0x1);
> > > >
> > > > -	list_for_each_entry(inst, &dev->instances, list) {
> > > > -
> > > > +	spin_lock_irqsave(&dev->irq_spinlock, flags);
> > > > +	list_for_each_entry_safe(inst, tmp, &dev->instances, list) {
> > > >  		if (irq_reason & BIT(INT_WAVE5_INIT_SEQ) ||
> > > >  		    irq_reason & BIT(INT_WAVE5_ENC_SET_PARAM)) {
> > > >  			if (dev->product_code == WAVE515_CODE && @@ -
> 82,14
> > > +89,22 @@
> > > > static void wave5_vpu_handle_irq(void *dev_id)
> > > >  		    irq_reason & BIT(INT_WAVE5_ENC_PIC)) {
> > > >  			if (cmd_done & BIT(inst->id)) {
> > > >  				cmd_done &= ~BIT(inst->id);
> > > > -				wave5_vdi_write_register(dev,
> > > W5_RET_QUEUE_CMD_DONE_INST,
> > > > -							 cmd_done);
> > > > -				inst->ops->finish_process(inst);
> > > > +				if (dev->irq >= 0) {
> > > > +					irq_subreason =
> > > > +						wave5_vdi_read_register(dev,
> > > W5_VPU_VINT_REASON);
> > > > +					if (!(irq_subreason &
> > > BIT(INT_WAVE5_DEC_PIC)))
> > > > +
> 	wave5_vdi_write_register(dev,
> > > > +
> > > W5_RET_QUEUE_CMD_DONE_INST,
> > > > +
> cmd_done);
> > > > +				}
> > > > +				val = BIT(INT_WAVE5_DEC_PIC);
> > > > +				kfifo_in(&inst->irq_status, &val,
> sizeof(int));
> > > >  			}
> > > >  		}
> > > > -
> > > > -		wave5_vpu_clear_interrupt(inst, irq_reason);
> > > >  	}
> > > > +	spin_unlock_irqrestore(&dev->irq_spinlock, flags);
> > > > +
> > > > +	up(&dev->irq_sem);
> > > >  }
> > > >
> > > >  static irqreturn_t wave5_vpu_irq_thread(int irq, void *dev_id) @@
> > > > -121,6 +136,35 @@ static enum hrtimer_restart
> > > wave5_vpu_timer_callback(struct hrtimer *timer)
> > > >  	return HRTIMER_RESTART;
> > > >  }
> > > >
> > > > +static int irq_thread(void *data) {
> > > > +	struct vpu_device *dev = (struct vpu_device *)data;
> > > > +	struct vpu_instance *inst, *tmp;
> > > > +	int irq_status, ret;
> > > > +
> > > > +	while (!kthread_should_stop()) {
> > > > +		if (down_interruptible(&dev->irq_sem))
> > > > +			continue;
> > > > +
> > > > +		if (kthread_should_stop())
> > > > +			break;
> > > > +
> > > > +		mutex_lock(&dev->irq_lock);
> > > > +		list_for_each_entry_safe(inst, tmp, &dev->instances,
> list) {
> > > > +			while (kfifo_len(&inst->irq_status)) {
> > > > +				ret = kfifo_out(&inst->irq_status,
> &irq_status,
> > > sizeof(int));
> > > > +				if (!ret)
> > > > +					break;
> > > > +
> > > > +				inst->ops->finish_process(inst);
> > > > +			}
> > > > +		}
> > > > +		mutex_unlock(&dev->irq_lock);
> > > > +	}
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > >  static int wave5_vpu_load_firmware(struct device *dev, const char
> > > *fw_name,
> > > >  				   u32 *revision)
> > > >  {
> > > > @@ -224,6 +268,8 @@ static int wave5_vpu_probe(struct
> > > > platform_device
> > > > *pdev)
> > > >
> > > >  	mutex_init(&dev->dev_lock);
> > > >  	mutex_init(&dev->hw_lock);
> > > > +	mutex_init(&dev->irq_lock);
> > > > +	spin_lock_init(&dev->irq_spinlock);
> > > >  	dev_set_drvdata(&pdev->dev, dev);
> > > >  	dev->dev = &pdev->dev;
> > > >
> > > > @@ -266,6 +312,9 @@ static int wave5_vpu_probe(struct
> > > > platform_device
> > > *pdev)
> > > >  	}
> > > >  	dev->product = wave5_vpu_get_product_id(dev);
> > > >
> > > > +	sema_init(&dev->irq_sem, 1);
> > > > +	INIT_LIST_HEAD(&dev->instances);
> > > > +	dev->irq_thread = kthread_run(irq_thread, dev, "irq thread");
> > > >  	dev->irq = platform_get_irq(pdev, 0);
> > > >  	if (dev->irq < 0) {
> > > >  		dev_err(&pdev->dev, "failed to get irq resource,
> falling
> > > back to
> > > > polling\n"); @@ -288,7 +337,6 @@ static int wave5_vpu_probe(struct
> > > platform_device *pdev)
> > > >  		}
> > > >  	}
> > > >
> > > > -	INIT_LIST_HEAD(&dev->instances);
> > > >  	ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
> > > >  	if (ret) {
> > > >  		dev_err(&pdev->dev, "v4l2_device_register, fail: %d\n",
> ret);
> > > @@
> > > > -351,6 +399,12 @@ static void wave5_vpu_remove(struct
> > > > platform_device
> > > > *pdev)  {
> > > >  	struct vpu_device *dev = dev_get_drvdata(&pdev->dev);
> > > >
> > > > +	if (dev->irq_thread) {
> > > > +		kthread_stop(dev->irq_thread);
> > > > +		up(&dev->irq_sem);
> > > > +		dev->irq_thread = NULL;
> > > > +	}
> > > > +
> > > >  	if (dev->irq < 0) {
> > > >  		kthread_destroy_worker(dev->worker);
> > > >  		hrtimer_cancel(&dev->hrtimer);
> > > > @@ -361,6 +415,7 @@ static void wave5_vpu_remove(struct
> > > > platform_device *pdev)
> > > >
> > > >  	mutex_destroy(&dev->dev_lock);
> > > >  	mutex_destroy(&dev->hw_lock);
> > > > +	mutex_destroy(&dev->irq_lock);
> > > >  	reset_control_assert(dev->resets);
> > > >  	clk_bulk_disable_unprepare(dev->num_clks, dev->clks);
> > > >  	wave5_vpu_enc_unregister_device(dev);
> > > > diff --git
> > > > a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> > > > b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> > > > index e16b990041c2..9a28d0f770bc 100644
> > > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> > > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> > > > @@ -197,8 +197,6 @@ int wave5_vpu_dec_close(struct vpu_instance
> > > > *inst,
> > > u32 *fail_res)
> > > >  	int retry = 0;
> > > >  	struct vpu_device *vpu_dev = inst->dev;
> > > >  	int i;
> > > > -	int inst_count = 0;
> > > > -	struct vpu_instance *inst_elm;
> > > >
> > > >  	*fail_res = 0;
> > > >  	if (!inst->codec_info)
> > > > @@ -223,6 +221,16 @@ int wave5_vpu_dec_close(struct vpu_instance
> > > > *inst,
> > > u32 *fail_res)
> > > >  		    retry++ >= MAX_FIRMWARE_CALL_RETRY) {
> > > >  			ret = -ETIMEDOUT;
> > > >  			goto unlock_and_return;
> > > > +		} else if (*fail_res == WAVE5_SYSERR_VPU_STILL_RUNNING)
> {
> > > > +			struct dec_output_info dec_info;
> > > > +
> > > > +			mutex_unlock(&vpu_dev->hw_lock);
> > > > +			wave5_vpu_dec_get_output_info(inst, &dec_info);
> > > > +			ret = mutex_lock_interruptible(&vpu_dev-
> >hw_lock);
> > > > +			if (ret) {
> > > > +				pm_runtime_put_sync(inst->dev->dev);
> > > > +				return ret;
> > > > +			}
> > > >  		}
> > > >  	} while (ret != 0);
> > > >
> > > > @@ -240,11 +248,6 @@ int wave5_vpu_dec_close(struct vpu_instance
> > > > *inst, u32 *fail_res)
> > > >
> > > >  	wave5_vdi_free_dma_memory(vpu_dev, &p_dec_info->vb_task);
> > > >
> > > > -	list_for_each_entry(inst_elm, &vpu_dev->instances, list)
> > > > -		inst_count++;
> > > > -	if (inst_count == 1)
> > > > -		pm_runtime_dont_use_autosuspend(vpu_dev->dev);
> > > > -
> > > >  unlock_and_return:
> > > >  	mutex_unlock(&vpu_dev->hw_lock);
> > > >  	pm_runtime_put_sync(inst->dev->dev);
> > > > @@ -710,8 +713,6 @@ int wave5_vpu_enc_close(struct vpu_instance
> > > > *inst,
> > > u32 *fail_res)
> > > >  	int ret;
> > > >  	int retry = 0;
> > > >  	struct vpu_device *vpu_dev = inst->dev;
> > > > -	int inst_count = 0;
> > > > -	struct vpu_instance *inst_elm;
> > > >
> > > >  	*fail_res = 0;
> > > >  	if (!inst->codec_info)
> > > > @@ -755,11 +756,6 @@ int wave5_vpu_enc_close(struct vpu_instance
> > > > *inst, u32 *fail_res)
> > > >
> > > >  	wave5_vdi_free_dma_memory(vpu_dev, &p_enc_info->vb_task);
> > > >
> > > > -	list_for_each_entry(inst_elm, &vpu_dev->instances, list)
> > > > -		inst_count++;
> > > > -	if (inst_count == 1)
> > > > -		pm_runtime_dont_use_autosuspend(vpu_dev->dev);
> > > > -
> > > >  	mutex_unlock(&vpu_dev->hw_lock);
> > > >  	pm_runtime_put_sync(inst->dev->dev);
> > > >
> > > > diff --git
> > > > a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > > > b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > > > index 45615c15beca..d26590141562 100644
> > > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > > > @@ -8,6 +8,7 @@
> > > >  #ifndef VPUAPI_H_INCLUDED
> > > >  #define VPUAPI_H_INCLUDED
> > > >
> > > > +#include <linux/kfifo.h>
> > > >  #include <linux/idr.h>
> > > >  #include <linux/genalloc.h>
> > > >  #include <media/v4l2-device.h>
> > > > @@ -163,7 +164,7 @@ enum set_param_option {
> > > >
> > > >  #define BUFFER_MARGIN				4096
> > > >
> > > > -#define MAX_FIRMWARE_CALL_RETRY			10
> > > > +#define MAX_FIRMWARE_CALL_RETRY			30
> > > >
> > > >  #define VDI_LITTLE_ENDIAN	0x0
> > > >
> > > > @@ -747,6 +748,7 @@ struct vpu_device {
> > > >  	struct video_device *video_dev_enc;
> > > >  	struct mutex dev_lock; /* lock for the src, dst v4l2 queues
> */
> > > >  	struct mutex hw_lock; /* lock hw configurations */
> > > > +	struct mutex irq_lock;
> > > >  	int irq;
> > > >  	enum product_id product;
> > > >  	struct vpu_attr attr;
> > > > @@ -764,7 +766,10 @@ struct vpu_device {
> > > >  	struct kthread_worker *worker;
> > > >  	int vpu_poll_interval;
> > > >  	int num_clks;
> > > > +	struct task_struct *irq_thread;
> > > > +	struct semaphore irq_sem;
> > > >  	struct reset_control *resets;
> > > > +	spinlock_t irq_spinlock; /* protect instances list */
> > > >  };
> > > >
> > > >  struct vpu_instance;
> > > > @@ -787,7 +792,7 @@ struct vpu_instance {
> > > >  	enum v4l2_xfer_func xfer_func;
> > > >  	enum v4l2_ycbcr_encoding ycbcr_enc;
> > > >  	enum v4l2_quantization quantization;
> > > > -
> > > > +	struct kfifo irq_status;
> > > >  	enum vpu_instance_state state;
> > > >  	enum vpu_instance_type type;
> > > >  	const struct vpu_instance_ops *ops; @@ -812,11 +817,17 @@
> struct
> > > > vpu_instance {
> > > >  	bool cbcr_interleave;
> > > >  	bool nv21;
> > > >  	bool eos;
> > > > +	bool retry;
> > > > +	bool empty_queue;
> > > > +	int queuing_num;
> > > > +	struct mutex feed_lock; /* lock for feeding bitstream
> buffers */
> > > >  	struct vpu_buf bitstream_vbuf;
> > > >  	dma_addr_t last_rd_ptr;
> > > >  	size_t remaining_consumed_bytes;
> > > >  	bool needs_reallocation;
> > > >
> > > > +	struct semaphore run_sem;
> > > > +	struct task_struct *run_thread;
> > > >  	unsigned int min_src_buf_count;
> > > >  	unsigned int rot_angle;
> > > >  	unsigned int mirror_direction;
> >





[Index of Archives]     [Linux Input]     [Video for Linux]     [Gstreamer Embedded]     [Mplayer Users]     [Linux USB Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Yosemite Backpacking]

  Powered by Linux