Re: Re: [PATCH] media: staging: tegra-vde: fix runtime pm imbalance on error

Thierry Reding <thierry.reding@xxxxxxxxx> · Fri, 22 May 2020 16:43:12 +0200

On Fri, May 22, 2020 at 04:23:18PM +0300, Dan Carpenter wrote:
> On Fri, May 22, 2020 at 03:10:31PM +0200, Thierry Reding wrote:
> > On Thu, May 21, 2020 at 08:39:02PM +0300, Dan Carpenter wrote:
> > > On Thu, May 21, 2020 at 05:22:05PM +0200, Rafael J. Wysocki wrote:
> > > > On Thu, May 21, 2020 at 11:15 AM Dan Carpenter <dan.carpenter@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Thu, May 21, 2020 at 11:42:55AM +0800, dinghao.liu@xxxxxxxxxx wrote:
> > > > > > Hi, Dan,
> > > > > >
> > > > > > I agree the best solution is to fix __pm_runtime_resume(). But there are also
> > > > > > many cases that assume pm_runtime_get_sync() will change PM usage
> > > > > > counter on error. According to my static analysis results, the number of these
> > > > > > "right" cases are larger. Adjusting __pm_runtime_resume() directly will introduce
> > > > > > more new bugs. Therefore I think we should resolve the "bug" cases individually.
> > > > > >
> > > > >
> > > > > That's why I was saying that we may need to introduce a new replacement
> > > > > function for pm_runtime_get_sync() that works as expected.
> > > > >
> > > > > There is no reason why we have to live with the old behavior.
> > > > 
> > > > What exactly do you mean by "the old behavior"?
> > > 
> > > I'm suggesting we leave pm_runtime_get_sync() alone but we add a new
> > > function which called pm_runtime_get_sync_resume() which does something
> > > like this:
> > > 
> > > static inline int pm_runtime_get_sync_resume(struct device *dev)
> > > {
> > > 	int ret;
> > > 
> > > 	ret = __pm_runtime_resume(dev, RPM_GET_PUT);
> > > 	if (ret < 0) {
> > > 		pm_runtime_put(dev);
> > > 		return ret;
> > > 	}
> > > 	return 0;
> > > }
> > > 
> > > I'm not sure if pm_runtime_put() is the correct thing to do?  The other
> > > thing is that this always returns zero on success.  I don't know that
> > > drivers ever care to differentiate between one and zero returns.
> > > 
> > > Then if any of the caller expect that behavior we update them to use the
> > > new function.
> > 
> > Does that really have many benefits, though? I understand that this
> > would perhaps be easier to use because it is more in line with how other
> > functions operate. On the other hand, in some cases you may want to call
> > a different version of pm_runtime_put() on failure, as discussed in
> > other threads.
> 
> I wasn't CC'd on the other threads so I don't know.  :/

It was actually earlier in this thread, see here for example:

	http://patchwork.ozlabs.org/project/linux-tegra/patch/20200520095148.10995-1-dinghao.liu@xxxxxxxxxx/#2438776

> I have always assumed it was something like this but I don't know the
> details and there is no documentation.

Now, I don't know more than you do, but it sounds to me like there are
multiple valid ways that we can use to drop the runtime PM reference and
whatever we choose to do in this new function may not always be the
right thing.

> http://sweng.the-davies.net/Home/rustys-api-design-manifesto
> You're essentially arguing that it's a #1 on Rusty's scale but ideally
> we would want to be at #7.

I think we could probably get it to at least a 3 or a 4 on that list if
we add a bit of documentation and fix all existing users.

Yes, 7 would be better than that, but I think we have to weigh the cost
of the added fragmentation versus the benefits that it gives us.

> > Even ignoring that issue, any existing callsites that are leaking the
> > reference would have to be updated to call the new function, which would
> > be pretty much the same amount of work as updating the callsites to fix
> > the leak, right?
> 
> With the current API we're constantly adding bugs.  I imagine that once
> we add a straight forward default and some documentation then we will
> solve this.

In my experience this stuff is often copy/pasted, so once we fix up all
of the bugs (and perhaps even add a coccinelle script) we shoudl be
seeing less bugs added all the time.

That said, I'm not opposed to adding a new function if we can make it
actually result in an overall improvement. What I'd hate to do is add a
new API that we all think is superior but then ends up not being usable
in half of the cases.

> > So if instead we just fix up the leaks, we might have a case of an API
> > that doesn't work as some of us (myself included) expected it, but at
> > least it would be consistent. If we add another variant things become
> > fragmented and therefore even more complicated to use and review.
> 
> That's the approach that we've been trying and it's clearly not working.

I think this is something we can likely solve through education and
documentation. Runtime PM is still a fairly new topic that not a lot of
people have experience with (at least if I extrapolate from the many
issues I've run into lately related to runtime PM), so I think it just
takes time for everyone to catch up. This looks similar to me to how we
used to have every allocation failure print out an error, even though
the allocator already complains pretty loudly when things go wrong. Now
we've removed most (if not all) of the redundant error messages and it's
become common knowledge among most maintainers, so new instances
typically get caught during review.

But again, if you can come up with a good alternative that works for the
majority of cases I think that would also be fine. Getting things right
without actually knowing any of the background is obviously better than
having to actually educate people. =)

Thierry
Attachment:
signature.asc

Description: PGP signature
_______________________________________________
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxx
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel