Re: Race in rcar-v4l2.c

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Tomi,

Thanks for testing that series.

On 2024-09-06 15:28:21 +0300, Tomi Valkeinen wrote:
> On 06/09/2024 14:27, Tomi Valkeinen wrote:
> > Hi,
> > 
> > On 06/09/2024 14:14, Tomi Valkeinen wrote:
> > > Hi Niklas,
> > > 
> > > On 06/09/2024 13:14, Niklas Söderlund wrote:
> > > > Hi Tomi,
> > > > 
> > > > Thanks for your report.
> > > > 
> > > > I have an on-going series trying to clean this all up [1]. The one
> > > > change in the v4l-async core I proposed was however rejected and I have
> > > > yet to circle back to figure out a different solution.
> > > > 
> > > > Could you give it a try and see if it also solves this issue?
> > > > 
> > > > 1. [PATCH 0/6] media: rcar-vin: Make use of multiple connections
> > > > in v4l-async
> > > >     https://lore.kernel.org/linux-renesas-
> > > > soc/20240129202254.1126012-1-niklas.soderlund+renesas@xxxxxxxxxxxx/
> > > 
> > > The compilation fails (broken in media: rcar-vin: Simplify remote
> > > source type detection) with:
> > > 
> > > drivers/media/platform/renesas/rcar-vin/rcar-dma.c:767:24: error:
> > > ‘struct rvin_dev’ has no member named ‘is_csi’
> > > 
> > > If I hack past that, I don't see the warnings anymore. But if I'm
> > > not mistaken, rvin_release() is not called at all anymore. I can
> > > also see plenty of leaks with kmemleak. Those seem to originate from
> > > max96712, but... I don't see max96712's remove() getting called
> > > either when I remove the module.

FWIW module unloading is tricky and AFIK not supported upstream. There 
is a reason the VIN capture pipeline have .suppress_bind_attrs = true 
;-)

> > 
> > Sorry, never mind that. I had the debug print in max96714... =). So
> > max96712 remove() gets called, but it's missing:
> > 
> > +       v4l2_subdev_cleanup(&priv->sd);
> > +       media_entity_cleanup(&priv->sd.entity);
> > +       v4l2_ctrl_handler_free(&priv->ctrl_handler);
> > 
> > But now I'm seeing rvin_release() getting called (no warnings). I'm not
> > sure what changed. Maybe I had some extra changes lying around. Let me
> > test the series a bit more...
> 
> I haven't seen the warning anymore, so I believe your series indeed fixes
> the issue.

Awesome! I will try to find some time soon and rebase that series, I 
think I have an idea on how to avoid having to mock around in the 
v4l-async core and still achieve the same thing.

> 
>  Tomi
> 
> > 
> >   Tomi
> > 
> > > I'm testing on Renesas' whitehawk board, with max96712 TPG. If you
> > > have that board, and want to try module loading & unloading, you
> > > also need to fix the max96712 remove function:
> > > 
> > > -       struct max96712_priv *priv = i2c_get_clientdata(client);
> > > +       struct v4l2_subdev *sd = i2c_get_clientdata(client);
> > > +       struct max96712_priv *priv = v4l2_get_subdevdata(sd);
> > > 
> > >   Tomi
> > > 
> > > > On 2024-09-06 12:57:50 +0300, Tomi Valkeinen wrote:
> > > > > Hi Niklas,
> > > > > 
> > > > > There seems to be a race in rcar-v4l2.c, causing
> > > > > WARN_ON(entity->use_count < 0) in pipeline_pm_power_one().
> > > > > 
> > > > > If my understanding is correct, the VIN v4l2 nodes are being created
> > > > > (rvin_v4l2_register), meaning they are userspace accessible,
> > > > > but the media
> > > > > pipeline as a whole is not ready yet (e.g. media links).
> > > > > 
> > > > > So what happens is that after some video nodes have been created, the
> > > > > userspace opens them (I think it's udevd checking the new
> > > > > device nodes),
> > > > > causing rvin_open(). rvin_open() goes through the media
> > > > > graph and does some
> > > > > PM enabling (I'm not familiar with the legacy v4l2_pipeline_pm_get()).
> > > > > However, as the links are not there, it doesn't really
> > > > > enable much at all.
> > > > > 
> > > > > Then the driver goes forward and finishes with the media graph.
> > > > > 
> > > > > Then the userspace closes the opened video nodes,
> > > > > rvin_release() gets called
> > > > > and it goes through the media graph, which now contains all
> > > > > the entities,
> > > > > and powers them down. As the entities were never powered up, we hit the
> > > > > use_count warning.
> > > > > 
> > > > > This happens quite often to me when loading the modules, but
> > > > > I think it can
> > > > > be made to happen more often by adding msleep(1000) to the beginning of
> > > > > rvin_release(), thus ensuring that the graph setup is
> > > > > finished before the
> > > > > rvin_release() proceeds (and hoping that the graph setup was
> > > > > not ready when
> > > > > rvin_open() was called).
> > > > > 
> > > > >   Tomi
> > > > > 
> > > > 
> > > 
> > 
> 

-- 
Kind Regards,
Niklas Söderlund




[Index of Archives]     [Linux Input]     [Video for Linux]     [Gstreamer Embedded]     [Mplayer Users]     [Linux USB Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Yosemite Backpacking]

  Powered by Linux