Re: [PATCH v2 1/5] v4l2-mem2mem: Fix missing v4l2_m2m_try_run call

Hans Verkuil <hverkuil@xxxxxxxxx> · Fri, 27 Jul 2018 17:29:42 +0200

On 27/07/18 17:16, Ezequiel Garcia wrote:
> On Fri, 2018-07-27 at 15:41 +0200, Hans Verkuil wrote:
>> On 27/07/18 15:35, Hans Verkuil wrote:
>>> On 25/07/18 19:15, Ezequiel Garcia wrote:
>>>> Commit 34dbb848d5e4 ("media: mem2mem: Remove excessive try_run call")
>>>> removed a redundant call to v4l2_m2m_try_run but instead introduced
>>>> a bug. Consider the following case:
>>>>
>>>>  1) Context A schedules, queues and runs job A.
>>>>  2) While the m2m device is running, context B schedules
>>>>     and queues job B. Job B cannot run, because it has to
>>>>     wait for job A.
>>>>  3) Job A completes, calls v4l2_m2m_job_finish, and tries
>>>>     to queue a job for context A, but since the context is
>>>>     empty it won't do anything.
>>>>
>>>> In this scenario, queued job B will never run. Fix this by calling
>>>> v4l2_m2m_try_run from v4l2_m2m_try_schedule.
>>>>
>>>> While here, add more documentation to these functions.
>>>>
>>>> Fixes: 34dbb848d5e4 ("media: mem2mem: Remove excessive try_run call")
>>>> Signed-off-by: Ezequiel Garcia <ezequiel@xxxxxxxxxxxxx>
>>>
>>> So just to be clear: this first patch fixes a regression and can be applied
>>> separately from the other patches?
>>>
> 
> That is correct, it's a regression and can be applied independently.
> 
> This patch has been tested independently of the rest of the series,
> using the test introduced in "selftests: media_tests: Add a
> memory-to-memory concurrent stress test".
> 
> Without this patch, the test fails.
> 
>>>> ---
>>>>  drivers/media/v4l2-core/v4l2-mem2mem.c | 32 +++++++++++++++++++++++---
>>>>  1 file changed, 29 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
>>>> index 5f9cd5b74cda..dfd796621b06 100644
>>>> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
>>>> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
>>>> @@ -209,15 +209,23 @@ static void v4l2_m2m_try_run(struct v4l2_m2m_dev *m2m_dev)
>>>>  	m2m_dev->curr_ctx->job_flags |= TRANS_RUNNING;
>>>>  	spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
>>>>  
>>>> +	dprintk("Running job on m2m_ctx: %p\n", m2m_dev->curr_ctx);
>>>
>>> Is this intended? It feels out of place in this patch.
>>>
> 
> It was intended :) Since the patch is adding some more documentation,
> it felt OK to add this debug message, as it also helps the testing.
> 
> But I can drop it if you think it's out of place.
> 
>>>>  	m2m_dev->m2m_ops->device_run(m2m_dev->curr_ctx->priv);
>>>>  }
>>>>  
>>>> -void v4l2_m2m_try_schedule(struct v4l2_m2m_ctx *m2m_ctx)
>>>> +/*
>>>> + * __v4l2_m2m_try_queue() - queue a job
>>>> + * @m2m_dev: m2m device
>>>> + * @m2m_ctx: m2m context
>>>> + *
>>>> + * Check if this context is ready to queue a job.
>>>> + *
>>>> + * This function can run in interrupt context.
>>>> + */
>>>> +static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev, struct v4l2_m2m_ctx *m2m_ctx)
>>>>  {
>>>> -	struct v4l2_m2m_dev *m2m_dev;
>>>>  	unsigned long flags_job, flags_out, flags_cap;
>>>>  
>>>> -	m2m_dev = m2m_ctx->m2m_dev;
>>>>  	dprintk("Trying to schedule a job for m2m_ctx: %p\n", m2m_ctx);
>>>>  
>>>>  	if (!m2m_ctx->out_q_ctx.q.streaming
>>>> @@ -275,7 +283,25 @@ void v4l2_m2m_try_schedule(struct v4l2_m2m_ctx *m2m_ctx)
>>>>  	m2m_ctx->job_flags |= TRANS_QUEUED;
>>>>  
>>>>  	spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags_job);
>>>> +}
>>>> +
>>>> +/**
>>>> + * v4l2_m2m_try_schedule() - schedule and possibly run a job for any context
>>>> + * @m2m_ctx: m2m context
>>>> + *
>>>> + * Check if this context is ready to queue a job. If suitable,
>>>> + * run the next queued job on the mem2mem device.
>>>> + *
>>>> + * This function shouldn't run in interrupt context.
>>>> + *
>>>> + * Note that v4l2_m2m_try_schedule() can schedule one job for this context,
>>>> + * and then run another job for another context.
>>>> + */
>>>> +void v4l2_m2m_try_schedule(struct v4l2_m2m_ctx *m2m_ctx)
>>>> +{
>>>> +	struct v4l2_m2m_dev *m2m_dev = m2m_ctx->m2m_dev;
>>>>  
>>>> +	__v4l2_m2m_try_queue(m2m_dev, m2m_ctx);
>>>
>>> Why introduce __v4l2_m2m_try_queue? Why not keep it in here?
>>>
> 
> Well, specifically we need to make sure xxx_try_run is called
> even if no job was queued. After a lot of back and forth, I ended
> up thinking the queue and run operations were different and needed
> different functions.
> 
> If we were to keep the code in try_schedule, we would have to use
> some extra conditionals or gotos to make sure try_run is called even
> if no job was queued.
> 
> The important thing to keep in mind here is that these functions
> could be called on a given context, but then actually run a job
> for another context.
> 
>>>>  	v4l2_m2m_try_run(m2m_dev);
>>
>> I'm completely confused: v4l2_m2m_try_schedule() already calls v4l2_m2m_try_run()!
>>
>> Either my brain has crashed due to the heatwave I'm suffering through, or there
>> is something amiss with this patch.
>>
>>
> 
> Right, v4l2_m2m_try_schedule calls v4l2_m2m_try_run *if* a job was queued.
> This might not always be the case.
> 
> Consider you schedule two jobs, both will be queued but only one will run.
> When will the second job run? Answer: never :)

Ah, now I get it. I hadn't seen that the 'return' statements in try_schedule
would prevent the try_run call at the end.

Thanks for the patch, I'll make a pull request for this one.

Regards,

	Hans