Re: [PATCH v4 3/7] genirq: Introduce irq_suspend_one() and irq_resume_one() callbacks

Doug Anderson <dianders@xxxxxxxxxxxx> · Thu, 13 Aug 2020 20:04:55 -0700

Hi,

On Thu, Aug 13, 2020 at 7:07 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> Doug,
>
> On Thu, Aug 13 2020 at 15:58, Doug Anderson wrote:
> > On Thu, Aug 13, 2020 at 3:09 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> >> > * If this interrupt fires while the system is suspended then please
> >> > wake the system up.
> >>
> >> Well, that's kinda contradicting itself. If the interrupt is masked then
> >> what is the point? I'm surely missing something subtle here.
> >
> > This is how I've always been told that the API works and there are at
> > least a handful of drivers in the kernel whose suspend routines both
> > enable wakeup and call disable_irq().  Isn't this also documented as
> > of commit f9f21cea3113 ("genirq: Clarify that irq wake state is
> > orthogonal to enable/disable")?
>
> Fair enough. The wording there is unfortunate and I probably should have
> spent more brain cycles before applying it. It suggests that this is a
> pure driver problem. I should have asked some of the questions I asked
> now back then :(

I mean, certainly a driver could be rewritten not to do this.  ...and,
in fact, the easier approach (for just solving my immediate concern)
would be to change cros-ec not to do this.  However, it was my
understanding that what cros-ec was doing was actually just fine and
part of the API to drivers.  This understanding was solidified when
the patch I mentioned landed.  When looking at this before I found
that certainly there are other drivers that do this and it felt better
to implement the proper thing rather than add a hack to cros-ec to
work around the Qualcomm pinctrl driver.

In general the idea here, I think, is that in the "suspend" call of a
driver it might want to disable interrupts so that it doesn't have to
deal with them after the driver has configured things (and adjusted
its internal data structures) for suspend.  However, it might still
want its interrupt to cause a wakeup.  ...so it wants the wakeup to
happen (and its resume call to be made to get everything back in the
right state) and at the end of the resume call it wants to enable its
interrupt handler again.  That seems like a sane design pattern to me,
but maybe I'm crazy.  Yes, I guess the driver could implement the
"noirq" suspend function, but sometimes it's simpler to have a single
suspend function that first leverages interrupts, then disables them
at an exact point it can control, and then finishes adjusting its
state.

I'll also note that the concept that a masked interrupt can "wake you
up" is also not unlike how ARM SoCs work, which is part of what made
me feel like this API was fine.  Specifically if you have interrupts
masked at the CPU level and then enter "WFI" (wait for interrupt) it
will wake up (or come out of idle) from one of those masked
interrupts.

> >> If that's the main problem which is solved in these callbacks, then I
> >> really have to ask why this has not been raised years ago. Why can't
> >> people talk?
> >
> > Not all of us have the big picture that you do to know how things
> > ought to work, I guess.  If nothing else someone looking at this
> > problem would think: "this must be a common problem, let's go see how
> > all the other places do it" and then they find how everyone else is
> > doing it and do it that way.  It requires the grander picture that a
> > maintainer has in order to say: whoa, everyone's copying the same
> > hack--let's come up with a better solution.
>
> That's not the point. I know how these things happen, but I fail to
> understand why nobody ever looks at this and says: OMG, I need to do yet
> another variant of copy&pasta of the same thing every other driver
> does. Why is there no infrastructure for that?
>
> Asking that question does not require a maintainer who always encouraged
> people to talk about exactly these kind of things instead of going off
> and creating the gazillionst broken copy of the same thing with yet
> another wart working around core code problems and thereby violating
> layering and introducing bugs which wouldn't exist otherwise.
>
> Spare me all the $corp reasons. I've heard all of them and if not then
> the not yet known reason won't be any more convincing. :)

As per above, if I was simply motivated to hack it to get it done I
would have suggested we just muck with cros_ec.  I certainly do have a
bias for getting things done and getting things landed, but I also try
to pride myself in not saying that we should just accept any old hack.
Perhaps many people posting patches just want any old crap landed, but
I'd like to think I'm not one of them.

> One of the most underutilized strengths of FOSS is that you can go and
> ask someone who has the big picture in his head before you go off and
> waste time on distangling copy&pasta, dealing with the resulting obvious
> bugs and then the latent ones which only surface 3 month after the
> product has shipped. Or like in this case figure out that the copy&pasta
> road is a dead end and then create something new without seeing the big
> picture and having analyzed completely what consequences this might have.

I've found that one of the best ways to get something figured out is
to post a patch, even if it's not perfect.  Perhaps in cases where
you're involved, but in general most cases where you just ask a
question you get ignored.  You've gotta post a patch.  This solution
was the best I was able to come up with and was discussed with several
people before posting.

> I don't know how much hours you and others spent on this. I surely know
> that after you gave me proper context it took me less than an hour to
> figure out that one problem you were trying to solve was already solved
> and the other one was just a matter of doing the counterpart of it. I
> definitely spent way more time on reviewing and debating.

I did spend quite a bit of time on it, though perhaps it's not
obvious.  Though I agree that the patch in isolation didn't have a
good enough description, I felt like it combined with the later
patches in the series did show what I was trying to do.

> So if you had asked upfront, I probably would have spent quite some time
> on it as well depending on the quality of the question and explanation
> but the total amount on both sides would have been significantly lower,
> which I consider a win-win situation.
>
> Of course I know that my $corp MBA foo is close to zero, so I just can
> be sure that it would have been a win for me :)
>
> Seriously, we need to promote a 'talk to each other' culture very
> actively. The people with the big picture in their head, aka
> maintainers, are happy to answer questions and they also want that
> others come forth and say "this makes no sense" instead of silently
> accepting that the five other drivers do something stupid. This would
> help to solve some of the well known problems:
>
>  - Maintainer scalability
>
>    I rather discuss a problem with you at the conceptual level upfront
>    instead of reviewing patches after the fact and having to tell you
>    that it's all wrong. The latter takes way more time.
>
>    Having a quick and dirty POC for illustration is fine and usually
>    useful.

OK, I will try to remember that, in the future, I should send
questions rather than patches to you.  I'm always learning the
workflows of the different maintainers, so sorry for killing so much
time.  :(

>  - Maintainer blinders
>
>    Maintainers need input from the outside as any other people because
>    they become blind to shortcomings in the area they are responsible
>    for as any other person. Especially if they maintain complex and
>    highly active subsystems.
>
>  - Submitter frustration
>
>    You spent a huge amount of time to come up with a solution for
>    something and then you get told by the maintainer/reviewer that the
>    time spent was wasted and your solution is crap. It does not matter
>    much what the politeness level of that message is. It sets you back
>    and causes frustration on both ends.
>
>  - Turn around times
>
>    A lot of time can be spared by talking to each other early. A half
>    baken POC patch is fine for opening such a discussion, but going down
>    all the way and then having the talk over the final patch review is
>    more than suboptimal and causes grief on both sides.

Yup, definitely understand.  Again, sorry for the misunderstandings
this time around and hopefully we can find better ways to interact in
the future.

> >>   "These two callbacks are interesting because sometimes an irq chip
> >>    needs to know about suspend/resume."
> >>
> >> Really valuable and precise technical information.
> >
> > Funny to get yelled at for not providing a detailed enough changelog.
> > Usually people complain that my changelogs are too detailed.  Sigh.
>
> The complaint you might get from me about an overly detailed changelog
> is that it has redundant or pointless information in it, e.g.
>
>   - the 500 lines of debug dump containing about 10 lines of valuable
>     information which you already decoded and condensed in order to
>     figure the problem out.
>
>   - anecdotes around the discovery which carry zero information and
>     often show that that the scope of the problem was not fully
>     understood.
>
>   - pointless examples of how to trigger the fail
>
>   - In depth explanaations of what the patch does instead of a concise
>     explanation at the conceptual level.
>
> You won't hear me complain about a concise and coherent in depth
> technical explanation of a problem.
>
> Writing changelogs is an art and I surely look at some of my own
> changelogs written long ago and yell at myself from time to time.
>
> Reading a patch goes top down obviously:
>
>       1) Subject line
>       2) Changelog
>       3) Patch.
>
> If I have to rumage for my crystal ball before #3 then I already spent
> more time than necessary. If the thing is some random feature then I
> might just say: try again. But if I get the sense that it is about a bug
> or  has some smell of a shorrcoming in the core code then I have to bite
> the bullet and decode it the hard way. Not the most efficient way. And
> from experience I can tell you that if #1 and #2 are already problematic
> then #3 needs some serious scrutiny in most cases.
>
> >> Hint: "Sometimes a chip needs to know" does not qualify :)
> >
> > Clearly I am not coherent.  ;-)  My only goal was to help enable
> > interrupts that were disabled / marked as wakeup (as per above,
> > documented to be OK) to work on Qualcomm chips.  This specifically
> > affects me because a driver that I need to work (cros_ec) does this.
>
> Mission acoomplished :)
>
> > If IRQCHIP_UNMASK_WAKEUP_ON_SUSPEND is good to add then it sounds like
> > a great plan to me.
>
> If it solves the problem and from what you explained it should do so
> then this is definitely the right way to go.

Wonderful!  Looking forward to Maulik's post doing it this way.

-Doug