Dne sreda, 09. oktober 2019 ob 12:18:45 CEST je Hans Verkuil napisal(a): > On 10/7/19 9:01 PM, Jernej Škrabec wrote: > > Dne ponedeljek, 07. oktober 2019 ob 12:44:24 CEST je Hans Verkuil napisal(a): > >> Hi Jernej, > >> > >> On 9/29/19 10:00 PM, Jernej Skrabec wrote: > >>> This series adds support for decoding multi-slice H264 frames along with > >>> support for V4L2_DEC_CMD_FLUSH and V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF. > >>> > >>> Code was tested by modified ffmpeg, which can be found here: > >>> https://github.com/jernejsk/FFmpeg, branch mainline-test > >>> It has to be configured with at least following options: > >>> --enable-v4l2-request --enable-libudev --enable-libdrm > >>> > >>> Samples used for testing: > >>> http://jernej.libreelec.tv/videos/h264/BA1_FT_C.mp4 > >>> http://jernej.libreelec.tv/videos/h264/h264.mp4 > >>> > >>> Command line used for testing: > >>> ffmpeg -hwaccel drm -hwaccel_device /dev/dri/card0 -i h264.mp4 -pix_fmt > >>> bgra -f fbdev /dev/fb0 > >>> > >>> Please note that V4L2_DEC_CMD_FLUSH was not tested because I'm > >>> not sure how. ffmpeg follows exactly which slice is last in frame > >>> and sets hold flag accordingly. Improper usage of hold flag would > >>> corrupt ffmpeg assumptions and it would probably crash. Any ideas > >>> how to test this are welcome! > >>> > >>> Thanks to Jonas for adjusting ffmpeg. > >>> > >>> Please let me know what you think. > >>> > >>> Best regards, > >>> Jernej > >>> > >>> Changes from v1: > >>> - added Rb tags > >>> - updated V4L2_DEC_CMD_FLUSH documentation > >>> - updated first slice detection in Cedrus > >>> - hold capture buffer flag is set according to source format > >>> - added v4l m2m stateless_(try_)decoder_cmd ioctl helpers > >>> > >>> Hans Verkuil (2): > >>> vb2: add V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF > >>> videodev2.h: add V4L2_DEC_CMD_FLUSH > >>> > >>> Jernej Skrabec (4): > >>> media: v4l2-mem2mem: add stateless_(try_)decoder_cmd ioctl helpers > >>> media: cedrus: Detect first slice of a frame > >>> media: cedrus: h264: Support multiple slices per frame > >>> media: cedrus: Add support for holding capture buffer > >>> > >>> Documentation/media/uapi/v4l/buffer.rst | 13 ++++++ > >>> .../media/uapi/v4l/vidioc-decoder-cmd.rst | 10 +++- > >>> .../media/uapi/v4l/vidioc-reqbufs.rst | 6 +++ > >>> .../media/videodev2.h.rst.exceptions | 1 + > >>> .../media/common/videobuf2/videobuf2-v4l2.c | 8 +++- > >>> drivers/media/v4l2-core/v4l2-mem2mem.c | 35 ++++++++++++++ > >>> drivers/staging/media/sunxi/cedrus/cedrus.h | 1 + > >>> .../staging/media/sunxi/cedrus/cedrus_dec.c | 11 +++++ > >>> .../staging/media/sunxi/cedrus/cedrus_h264.c | 11 ++++- > >>> .../staging/media/sunxi/cedrus/cedrus_hw.c | 8 ++-- > >>> .../staging/media/sunxi/cedrus/cedrus_video.c | 14 ++++++ > >>> include/media/v4l2-mem2mem.h | 46 +++++++++++++++++++ > >>> include/media/videobuf2-core.h | 3 ++ > >>> include/media/videobuf2-v4l2.h | 5 ++ > >>> include/uapi/linux/videodev2.h | 14 ++++-- > >>> 15 files changed, 175 insertions(+), 11 deletions(-) > >> > >> I didn't want to make a v3 of this series, instead I hacked this patch > >> that > >> will hopefully do all the locking right. > >> > >> Basically I moved all the 'held' related code into v4l2-mem2mem under > >> job_spinlock. This simplifies the driver code as well. > >> > >> But this is a hack that sits on top of this series. If your ffmpeg tests > >> are now successful, then I'll turn this into a proper series with > >> correct documentation (a lot of the comments are now wrong with this > >> patch, so just ignore that). > > > > Thanks for looking into this! With small fix mentioned below, it works! > > Note that both scenarios I tested (flushing during decoding and flushing > > after decoding is finished) are focused on capture queue. In order to > > trigger output queue flush, ffmpeg would need to queue multiple jobs and > > call flush before they are all processed. This is not something I can do > > at this time. Maybe Jonas can help with modifying ffmpeg appropriately. > > However, code for case seems correct to me. > > > >> Regards, > >> > >> Hans > >> > >> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c > >> b/drivers/media/v4l2-core/v4l2-mem2mem.c index 2677a07e4c9b..f81a8f2465ab > >> 100644 > >> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c > >> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c > >> @@ -412,25 +412,24 @@ static void v4l2_m2m_cancel_job(struct v4l2_m2m_ctx > >> *m2m_ctx) } > >> > >> } > >> > >> -void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev, > >> - struct v4l2_m2m_ctx *m2m_ctx) > >> +static bool _v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev, > >> + struct v4l2_m2m_ctx *m2m_ctx) > >> > >> { > >> > >> - unsigned long flags; > >> - > >> - spin_lock_irqsave(&m2m_dev->job_spinlock, flags); > >> > >> if (!m2m_dev->curr_ctx || m2m_dev->curr_ctx != m2m_ctx) { > >> > >> - spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags); > >> > >> dprintk("Called by an instance not currently > > > > running\n"); > > > >> - return; > >> + return false; > >> > >> } > >> > >> list_del(&m2m_dev->curr_ctx->queue); > >> m2m_dev->curr_ctx->job_flags &= ~(TRANS_QUEUED | TRANS_RUNNING); > >> wake_up(&m2m_dev->curr_ctx->finished); > >> m2m_dev->curr_ctx = NULL; > >> > >> + return true; > >> +} > >> > >> - spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags); > >> - > >> +static void v4l2_m2m_job_next(struct v4l2_m2m_dev *m2m_dev, > >> + struct v4l2_m2m_ctx *m2m_ctx) > >> +{ > >> > >> /* This instance might have more buffers ready, but since we do not > >> > >> * allow more than one job on the job_queue per instance, each has > >> * to be scheduled separately after the previous one finishes. */ > >> > >> @@ -441,8 +440,113 @@ void v4l2_m2m_job_finish(struct v4l2_m2m_dev > >> *m2m_dev, */ > >> > >> schedule_work(&m2m_dev->job_work); > >> > >> } > >> > >> + > >> +void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev, > >> + struct v4l2_m2m_ctx *m2m_ctx) > >> +{ > >> + unsigned long flags; > >> + > >> + spin_lock_irqsave(&m2m_dev->job_spinlock, flags); > >> + if (!_v4l2_m2m_job_finish(m2m_dev, m2m_ctx)) { > >> + spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags); > >> + return; > >> + } > >> + spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags); > >> + > >> + v4l2_m2m_job_next(m2m_dev, m2m_ctx); > >> +} > >> > >> EXPORT_SYMBOL(v4l2_m2m_job_finish); > >> > >> +void v4l2_m2m_job_finish_held(struct v4l2_m2m_dev *m2m_dev, > >> + struct v4l2_m2m_ctx *m2m_ctx, > >> + enum vb2_buffer_state state) > >> +{ > >> + struct vb2_v4l2_buffer *src_buf, *dst_buf; > >> + unsigned long flags; > >> + > >> + spin_lock_irqsave(&m2m_dev->job_spinlock, flags); > >> + src_buf = v4l2_m2m_src_buf_remove(m2m_ctx); > >> + dst_buf = v4l2_m2m_next_dst_buf(m2m_ctx); > >> + > >> + if (!src_buf || !dst_buf) { > >> + pr_err("Missing source and/or destination buffers\n"); > >> + spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags); > >> + return; > >> + } > >> + v4l2_m2m_buf_done(src_buf, state); > >> + if (!dst_buf->is_held) { > >> + v4l2_m2m_dst_buf_remove(m2m_ctx); > >> + v4l2_m2m_buf_done(dst_buf, state); > >> + } > >> + if (!_v4l2_m2m_job_finish(m2m_dev, m2m_ctx)) { > >> + spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags); > >> + return; > >> + } > >> + spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags); > >> + > >> + v4l2_m2m_job_next(m2m_dev, m2m_ctx); > >> +} > >> +EXPORT_SYMBOL(v4l2_m2m_job_finish_held); > >> + > >> +/** > >> + * v4l2_m2m_release_capture_buf() - check if the capture buffer should > >> be > >> + * released > >> + * > >> + * @out_vb: the output buffer > >> + * @cap_vb: the capture buffer > >> + * > >> + * This helper function returns true if the current capture buffer > >> should > >> + * be released to vb2. This is the case if the output buffer specified > >> that + * the capture buffer should be held (i.e. not returned to vb2) > >> AND if the + * timestamp of the capture buffer differs from the output > >> buffer timestamp. + * > >> + * This helper is to be called at the start of the device_run callback: > >> + * > >> + * .. code-block:: c > >> + * > >> + * if (v4l2_m2m_release_capture_buf(out_vb, cap_vb)) { > >> + * v4l2_m2m_dst_buf_remove(m2m_ctx); > >> + * v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE); > >> + * cap_vb = v4l2_m2m_next_dst_buf(m2m_ctx); > >> + * } > >> + * cap_vb->is_held = out_vb->flags & > > > > V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF; > > > >> + * > >> + * ... > >> + * > >> + * v4l2_m2m_buf_done(out_vb, VB2_BUF_STATE_DONE); > >> + * if (!cap_vb->is_held) { > >> + * v4l2_m2m_dst_buf_remove(m2m_ctx); > >> + * v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE); > >> + * } > >> + * > >> + * This allows for multiple output buffers to be used to fill in a > >> single > >> + * capture buffer. This is typically used by stateless decoders where > >> + * multiple e.g. H.264 slices contribute to a single decoded frame. > >> + */ > >> +struct vb2_v4l2_buffer *v4l2_m2m_release_capture_buf(struct v4l2_m2m_ctx > >> *m2m_ctx) +{ > >> + struct v4l2_m2m_dev *m2m_dev = m2m_ctx->m2m_dev; > >> + struct vb2_v4l2_buffer *src, *dst; > >> + unsigned long flags; > >> + > >> + spin_lock_irqsave(&m2m_dev->job_spinlock, flags); > >> + src = v4l2_m2m_next_src_buf(m2m_ctx); > >> + dst = v4l2_m2m_next_dst_buf(m2m_ctx); > >> + > >> + if (dst->is_held && dst->vb2_buf.copied_timestamp && > >> + src->vb2_buf.timestamp != dst->vb2_buf.timestamp) { > >> + dst->is_held = false; > >> + v4l2_m2m_dst_buf_remove(m2m_ctx); > >> + v4l2_m2m_buf_done(dst, VB2_BUF_STATE_DONE); > >> + dst = v4l2_m2m_next_dst_buf(m2m_ctx); > >> + } > >> + dst->is_held = src->flags & V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF; > >> + src->flags &= ~V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF; > >> + spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags); > >> + return dst; > >> +} > >> +EXPORT_SYMBOL(v4l2_m2m_release_capture_buf); > >> + > >> > >> int v4l2_m2m_reqbufs(struct file *file, struct v4l2_m2m_ctx *m2m_ctx, > >> > >> struct v4l2_requestbuffers *reqbufs) > >> > >> { > >> > >> @@ -1171,19 +1275,28 @@ int v4l2_m2m_ioctl_stateless_decoder_cmd(struct > >> file *file, void *priv, { > >> > >> struct v4l2_fh *fh = file->private_data; > >> struct vb2_v4l2_buffer *out_vb, *cap_vb; > >> > >> + struct v4l2_m2m_dev *m2m_dev = fh->m2m_ctx->m2m_dev; > >> + unsigned long flags; > >> > >> int ret; > >> > >> ret = v4l2_m2m_ioctl_stateless_try_decoder_cmd(file, priv, dc); > >> if (ret < 0) > >> > >> return ret; > >> > >> + spin_lock_irqsave(&m2m_dev->job_spinlock, flags); > >> > >> out_vb = v4l2_m2m_last_src_buf(fh->m2m_ctx); > >> cap_vb = v4l2_m2m_last_dst_buf(fh->m2m_ctx); > >> > >> - if (out_vb) > >> + if (out_vb && (out_vb->flags & V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF)) > > > > { > > > >> out_vb->flags &= ~V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF; > >> > >> - else if (cap_vb && cap_vb->is_held) > >> - v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE); > >> + } else if (cap_vb && cap_vb->is_held) { > >> + cap_vb->is_held = false; > >> + if (m2m_dev->curr_ctx) { > > > > Above condition should be negated. > > Close. It should check that this buffer isn't currently being processed. > So: > > if (m2m_dev->curr_ctx != fh->m2m_ctx) { > > Can you test with this change? If this works, then I'll post a proper > series for this. I confirm that it works. Best regards, Jernej > > Thanks! > > Hans > > > Best regards, > > Jernej > > > >> + v4l2_m2m_dst_buf_remove(fh->m2m_ctx); > >> + v4l2_m2m_buf_done(cap_vb, > > > > VB2_BUF_STATE_DONE); > > > >> + } > >> + } > >> + spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags); > >> > >> return 0; > >> > >> } > >> > >> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c > >> b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c index > >> 67f7d4326fc1..4e30f263b427 100644 > >> --- a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c > >> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c > >> @@ -30,14 +30,7 @@ void cedrus_device_run(void *priv) > >> > >> struct media_request *src_req; > >> > >> run.src = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx); > >> > >> - run.dst = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx); > >> - > >> - if (v4l2_m2m_release_capture_buf(run.src, run.dst)) { > >> - v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx); > >> - v4l2_m2m_buf_done(run.dst, VB2_BUF_STATE_DONE); > >> - run.dst = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx); > >> - } > >> - run.dst->is_held = run.src->flags & > > > > V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF; > > > >> + run.dst = v4l2_m2m_release_capture_buf(ctx->fh.m2m_ctx); > >> > >> run.first_slice = !run.dst->vb2_buf.copied_timestamp || > >> > >> run.src->vb2_buf.timestamp != run.dst- > >> > >> vb2_buf.timestamp; > >> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c > >> b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c index > >> 99fedec80224..242cad82cc8c 100644 > >> --- a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c > >> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c > >> @@ -103,7 +103,6 @@ static irqreturn_t cedrus_irq(int irq, void *data) > >> > >> { > >> > >> struct cedrus_dev *dev = data; > >> struct cedrus_ctx *ctx; > >> > >> - struct vb2_v4l2_buffer *src_buf, *dst_buf; > >> > >> enum vb2_buffer_state state; > >> enum cedrus_irq_status status; > >> > >> @@ -121,26 +120,12 @@ static irqreturn_t cedrus_irq(int irq, void *data) > >> > >> dev->dec_ops[ctx->current_codec]->irq_disable(ctx); > >> dev->dec_ops[ctx->current_codec]->irq_clear(ctx); > >> > >> - src_buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx); > >> - dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx); > >> - > >> - if (!src_buf || !dst_buf) { > >> - v4l2_err(&dev->v4l2_dev, > >> - "Missing source and/or destination > > > > buffers\n"); > > > >> - return IRQ_HANDLED; > >> - } > >> - > >> > >> if (status == CEDRUS_IRQ_ERROR) > >> > >> state = VB2_BUF_STATE_ERROR; > >> > >> else > >> > >> state = VB2_BUF_STATE_DONE; > >> > >> - v4l2_m2m_buf_done(src_buf, state); > >> - if (!dst_buf->is_held) { > >> - v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx); > >> - v4l2_m2m_buf_done(dst_buf, state); > >> - } > >> - v4l2_m2m_job_finish(ctx->dev->m2m_dev, ctx->fh.m2m_ctx); > >> + v4l2_m2m_job_finish_held(ctx->dev->m2m_dev, ctx->fh.m2m_ctx, state); > >> > >> return IRQ_HANDLED; > >> > >> } > >> > >> diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h > >> index 8ae2f56c7fa3..48ca7d3eaa3d 100644 > >> --- a/include/media/v4l2-mem2mem.h > >> +++ b/include/media/v4l2-mem2mem.h > >> @@ -173,6 +173,10 @@ void v4l2_m2m_try_schedule(struct v4l2_m2m_ctx > >> *m2m_ctx); void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev, > >> > >> struct v4l2_m2m_ctx *m2m_ctx); > >> > >> +void v4l2_m2m_job_finish_held(struct v4l2_m2m_dev *m2m_dev, > >> + struct v4l2_m2m_ctx *m2m_ctx, > >> + enum vb2_buffer_state state); > >> + > >> > >> static inline void > >> v4l2_m2m_buf_done(struct vb2_v4l2_buffer *buf, enum vb2_buffer_state > >> state) > >> > >> { > >> @@ -644,47 +648,7 @@ void v4l2_m2m_buf_copy_metadata(const struct > >> vb2_v4l2_buffer *out_vb, struct vb2_v4l2_buffer *cap_vb, > >> > >> bool copy_frame_flags); > >> > >> -/** > >> - * v4l2_m2m_release_capture_buf() - check if the capture buffer should > >> be > >> - * released > >> - * > >> - * @out_vb: the output buffer > >> - * @cap_vb: the capture buffer > >> - * > >> - * This helper function returns true if the current capture buffer > >> should > >> - * be released to vb2. This is the case if the output buffer specified > >> that - * the capture buffer should be held (i.e. not returned to vb2) > >> AND if the - * timestamp of the capture buffer differs from the output > >> buffer timestamp. - * > >> - * This helper is to be called at the start of the device_run callback: > >> - * > >> - * .. code-block:: c > >> - * > >> - * if (v4l2_m2m_release_capture_buf(out_vb, cap_vb)) { > >> - * v4l2_m2m_dst_buf_remove(m2m_ctx); > >> - * v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE); > >> - * cap_vb = v4l2_m2m_next_dst_buf(m2m_ctx); > >> - * } > >> - * cap_vb->is_held = out_vb->flags & > > > > V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF; > > > >> - * > >> - * ... > >> - * > >> - * v4l2_m2m_buf_done(out_vb, VB2_BUF_STATE_DONE); > >> - * if (!cap_vb->is_held) { > >> - * v4l2_m2m_dst_buf_remove(m2m_ctx); > >> - * v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE); > >> - * } > >> - * > >> - * This allows for multiple output buffers to be used to fill in a > >> single > >> - * capture buffer. This is typically used by stateless decoders where > >> - * multiple e.g. H.264 slices contribute to a single decoded frame. > >> - */ > >> -static inline bool v4l2_m2m_release_capture_buf(const struct > >> vb2_v4l2_buffer *out_vb, - > >> > > const struct vb2_v4l2_buffer *cap_vb) > > > >> -{ > >> - return cap_vb->is_held && cap_vb->vb2_buf.copied_timestamp && > >> - out_vb->vb2_buf.timestamp != cap_vb->vb2_buf.timestamp; > >> -} > >> +struct vb2_v4l2_buffer *v4l2_m2m_release_capture_buf(struct v4l2_m2m_ctx > >> *m2m_ctx); > >> > >> /* v4l2 request helper */