Re: [PATCHv3 RFC 4/4] media: Catch null pipes on pipeline stop

Laurent Pinchart <laurent.pinchart@xxxxxxxxxxxxxxxx> · Wed, 14 Dec 2016 23:16:08 +0200

Hi Kieran,

On Wednesday 14 Dec 2016 19:56:51 Kieran Bingham wrote:
> Hello Me.
> 
> Ok, so a bit of investigation into *why* we have an unbalanced
> media_pipeline stop call.
> 
> After a suspend/resume cycle, the function vsp1_pm_runtime_resume() is
> called by the PM framework.
> 
> The hardware is unable to reset successfully and the reset call returns
> -ETIMEDOUT which gets passed back to the PM framework as a failure and
> thus the device is not 'resumed'
> 
> This process is commenced, as the DU driver tries to enable the VSP, we
> get the following call stack:
> 
> rcar_du_crtc_resume()
>   rcar_du_vsp_enable()
>     vsp1_du_setup_lif() // returns void
>       vsp1_device_get() // returns -ETIMEDOUT,

I suspect the call stack to not be entirely accurate as the 
rcar_du_crtc_resume() is never called :-)

> As the vsp1_du_setup_lif() returns void, the fact that the hardware
> failed to start is ignored.
> 
> Later we get a call to  rcar_du_vsp_disable(), which again calls into
> vsp1_du_setup_lif(), this time to disable the pipeline. And it is here,
> that the call to media_pipeline_stop() is an unbalanced call, as the
> corresponding media_pipeline_start() would only have been called if the
> vsp1_device_get() had succeeded, thus we find ourselves with a kernel
> panic on a null dereference.
> 
> Sorry for the terse notes, they are possibly/probably really for me if I
> get tasked to look back at this.
> --
> Regards
> 
> Kieran
> 
> On 13/12/16 17:59, Kieran Bingham wrote:
> > media_entity_pipeline_stop() can be called through error paths with a
> > NULL entity pipe object. In this instance, stopping is a no-op, so
> > simply return without any action
> > 
> > Signed-off-by: Kieran Bingham <kieran.bingham+renesas@xxxxxxxxxxxxxxxx>
> > ---
> > 
> > I've marked this patch as RFC, although if deemed suitable, by all means
> > integrate it as is.
> > 
> > When testing suspend/resume operations on VSP1, I encountered a segfault
> > on the WARN_ON(!pipe->streaming_count) line, where 'pipe == NULL'. The
> > simple protection fix is to return early in this instance, as this patch
> > does however:
> > 
> > A) Does this early return path warrant a WARN() statement itself, to
> > identify drivers which are incorrectly calling
> > media_entity_pipeline_stop() with an invalid entity, or would this just
> > be noise ...
> > 
> > and therefore..
> > 
> > B) I also partly assume this patch could simply get NAK'd with a request
> > to go and dig out the root cause of calling media_entity_pipeline_stop()
> > with an invalid entity.
> > 
> > My brief investigation so far here so far shows that it's almost a second
> > order fault - where the first suspend resume cycle completes but leaves
> > the entity in an invalid state having followed an error path - and then
> > on a second suspend/resume - the stop fails with the affected segfault.
> > 
> > If statement A) or B) apply here, please drop this patch from the series,
> > and don't consider it a blocking issue for the other 3 patches.
> > 
> > Kieran
> > 
> >  drivers/media/media-entity.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/media/media-entity.c b/drivers/media/media-entity.c
> > index c68239e60487..93c9cbf4bf46 100644
> > --- a/drivers/media/media-entity.c
> > +++ b/drivers/media/media-entity.c
> > @@ -508,6 +508,8 @@ void __media_entity_pipeline_stop(struct media_entity
> > *entity)> 
> >  	struct media_entity_graph *graph = &entity->pipe->graph;
> >  	struct media_pipeline *pipe = entity->pipe;
> > 
> > +	if (!pipe)
> > +		return;
> > 
> >  	WARN_ON(!pipe->streaming_count);
> >  	media_entity_graph_walk_start(graph, entity);

-- 
Regards,

Laurent Pinchart