Hi Kieran, On Wed, Dec 14, 2016 at 12:27:37PM +0000, Kieran Bingham wrote: > Hi Sakari, > > On 14/12/16 07:28, Sakari Ailus wrote: > > Hi Kieran, > > > > On Tue, Dec 13, 2016 at 05:59:44PM +0000, Kieran Bingham wrote: > >> media_entity_pipeline_stop() can be called through error paths with a > >> NULL entity pipe object. In this instance, stopping is a no-op, so > >> simply return without any action > > > > The approach of returning silently is wrong; the streaming count is indeed a > > count: you have to decrement it the exactly same number of times it's been > > increased. Code that attempts to call __media_entity_pipeline_stop() on an > > entity that's not streaming is simply buggy. > > Ok, Thanks for the heads up on where to look, as I suspected we are in > section B) of my comments below :D > > > I've got a patch here that adds a warning to graph traversal on streaming > > count being zero. I sent a pull request including that some days ago: > > > > <URL:http://www.spinics.net/lists/linux-media/msg108980.html> > > <URL:http://www.spinics.net/lists/linux-media/msg108995.html> > > Excellent, thanks, I've merged your branch into mine, and I'll > investigate where the error is actually coming from. > > -- > Ok - so I've merged your patches, (and dropped this patch) but they > don't fire any warning before I hit my null-pointer dereference in > __media_pipeline_stop(), on the WARN_ON(!pipe->streaming_count); > > So I think this is a case of calling stop on an invalid entity rather > than an unbalanced start/stop somehow: Not necessarily. The pipe is set to NULL if the count goes to zero. This check should be dropped, it's ill-placed anyway: if streaming count is zero the pipe is expected to be NULL anyway in which case you get a NULL pointer exception (that you saw there). With the check removed, you should get the warning instead. > > [ 613.830334] vsp1 fea38000.vsp: failed to reset wpf.0 > [ 613.838137] PM: resume of devices complete after 124.410 msecs > [ 613.847390] PM: Finishing wakeup. > [ 613.852247] Restarting tasks ... done. > [ 615.864801] ravb e6800000.ethernet eth0: Link is Up - 100Mbps/Full - > flow control rx/tx > [ 617.011299] vsp1 fea38000.vsp: failed to reset wpf.0 > [ 617.017777] vsp1 fea38000.vsp: DRM pipeline stop timeout > [ 617.024649] Unable to handle kernel NULL pointer dereference at > virtual address 00000000 > [ 617.034273] pgd = ffffff80098c5000 > [ 617.039171] [00000000] *pgd=000000073fffe003[ 617.043336] , > *pud=000000073fffe003 > , *pmd=0000000000000000[ 617.050353] > [ 617.053282] Internal error: Oops: 96000005 [#1] PREEMPT SMP > [ 617.060309] Modules linked in: > [ 617.064793] CPU: 1 PID: 683 Comm: kworker/1:2 Not tainted > 4.9.0-00506-g4d2a939ea654 #350 > [ 617.074320] Hardware name: Renesas Salvator-X board based on r8a7795 (DT) > [ 617.082578] Workqueue: events drm_mode_rmfb_work_fn > [ 617.088884] task: ffffffc6fabd2b00 task.stack: ffffffc6f9d90000 > [ 617.096246] PC is at __media_pipeline_stop+0x24/0xe8 > [ 617.102644] LR is at media_pipeline_stop+0x34/0x48 > > <regs and stack snipped> > > [ 617.863386] [<ffffff800853e724>] __media_pipeline_stop+0x24/0xe8 > [ 617.870417] [<ffffff800853e81c>] media_pipeline_stop+0x34/0x48 > [ 617.877272] [<ffffff8008568000>] vsp1_du_setup_lif+0x68/0x5a8 > [ 617.884047] [<ffffff80084de9d4>] rcar_du_vsp_disable+0x2c/0x38 > [ 617.890898] [<ffffff80084da710>] rcar_du_crtc_stop+0x198/0x1e8 > [ 617.897749] [<ffffff80084da780>] rcar_du_crtc_disable+0x20/0x38 > [ 617.904683] [<ffffff80084ac9b0>] > drm_atomic_helper_commit_modeset_disables+0x1b0/0x3d8 > [ 617.913634] [<ffffff80084db9cc>] rcar_du_atomic_complete+0x34/0xc8 > [ 617.920828] [<ffffff80084dbd20>] rcar_du_atomic_commit+0x2c0/0x2d0 > [ 617.928012] [<ffffff80084ceffc>] drm_atomic_commit+0x5c/0x68 > [ 617.934663] [<ffffff80084ad2e0>] drm_atomic_helper_set_config+0x90/0xd0 > [ 617.942288] [<ffffff80084c06a0>] drm_mode_set_config_internal+0x70/0x100 > [ 617.949996] [<ffffff80084c0760>] drm_crtc_force_disable+0x30/0x40 > [ 617.957084] [<ffffff80084d0b10>] drm_framebuffer_remove+0x100/0x128 > [ 617.964347] [<ffffff80084d0b80>] drm_mode_rmfb_work_fn+0x48/0x60 > [ 617.971352] [<ffffff80080e9770>] process_one_work+0x1e0/0x738 > [ 617.978084] [<ffffff80080e9f24>] worker_thread+0x25c/0x4a0 > [ 617.984546] [<ffffff80080f11ac>] kthread+0xd4/0xe8 > [ 617.990305] [<ffffff8008083680>] ret_from_fork+0x10/0x50 > > > So, I'll have to schedule some time to investigate this deeper and find > out where we are calling stop on an invalid entity object. > > I agree that this patch should simply be dropped though, it was more to > promote a bit of discussion on the area to help me understand what was > going on, which it has!. > > Thankyou Sakari! > > -- > Regards > > Kieran > > > > I think the check here could simply be removed as the check is done for > > every entity in the pipeline with the above patch. > > > > If there's still a wish to check for the pipe field which should not be > > written by drivers, it should be done during pipeline traversal so that it > > applies to all entities in the pipeline, not just where traversal starts. > > > >> > >> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@xxxxxxxxxxxxxxxx> > >> --- > >> > >> I've marked this patch as RFC, although if deemed suitable, by all means > >> integrate it as is. > >> > >> When testing suspend/resume operations on VSP1, I encountered a segfault on the > >> WARN_ON(!pipe->streaming_count) line, where 'pipe == NULL'. The simple > >> protection fix is to return early in this instance, as this patch does however: > >> > >> A) Does this early return path warrant a WARN() statement itself, to identify > >> drivers which are incorrectly calling media_entity_pipeline_stop() with an > >> invalid entity, or would this just be noise ... > >> > >> and therefore.. > >> > >> B) I also partly assume this patch could simply get NAK'd with a request to go > >> and dig out the root cause of calling media_entity_pipeline_stop() with an > >> invalid entity. > >> > >> My brief investigation so far here so far shows that it's almost a second order > >> fault - where the first suspend resume cycle completes but leaves the entity in > >> an invalid state having followed an error path - and then on a second > >> suspend/resume - the stop fails with the affected segfault. > >> > >> If statement A) or B) apply here, please drop this patch from the series, and > >> don't consider it a blocking issue for the other 3 patches. > >> > >> Kieran > >> > >> > >> drivers/media/media-entity.c | 2 ++ > >> 1 file changed, 2 insertions(+) > >> > >> diff --git a/drivers/media/media-entity.c b/drivers/media/media-entity.c > >> index c68239e60487..93c9cbf4bf46 100644 > >> --- a/drivers/media/media-entity.c > >> +++ b/drivers/media/media-entity.c > >> @@ -508,6 +508,8 @@ void __media_entity_pipeline_stop(struct media_entity *entity) > >> struct media_entity_graph *graph = &entity->pipe->graph; > >> struct media_pipeline *pipe = entity->pipe; > >> > >> + if (!pipe) > >> + return; > >> > >> WARN_ON(!pipe->streaming_count); > >> media_entity_graph_walk_start(graph, entity); > > -- Regards, Sakari Ailus e-mail: sakari.ailus@xxxxxx XMPP: sailus@xxxxxxxxxxxxxx