On Tue 28 Jul 13:11 PDT 2020, Lina Iyer wrote: > On Tue, Jul 28 2020 at 13:51 -0600, Stephen Boyd wrote: > > Quoting Lina Iyer (2020-07-28 09:52:12) > > > On Mon, Jul 27 2020 at 18:45 -0600, Stephen Boyd wrote: > > > >Quoting Lina Iyer (2020-07-24 09:28:25) > > > >> On Fri, Jul 24 2020 at 03:03 -0600, Rajendra Nayak wrote: > > > >> >Hi Maulik/Lina, > > > >> > > > > >> >On 7/23/2020 11:36 PM, Stanimir Varbanov wrote: > > > >> >>Hi Rajendra, > > > >> >> > > > >> >>After applying 2,3 and 4/5 patches on linaro-integration v5.8-rc2 I see > > > >> >>below messages on db845: > > > >> >> > > > >> >>qcom-venus aa00000.video-codec: dev_pm_opp_set_rate: failed to find > > > >> >>current OPP for freq 533000097 (-34) > > > >> >> > > > >> >>^^^ This one is new. > > > >> >> > > > >> >>qcom_rpmh TCS Busy, retrying RPMH message send: addr=0x30000 > > > >> >> > > > >> >>^^^ and this message is annoying, can we make it pr_debug in rpmh? > > > >> > > > > >> How annoyingly often do you see this message? > > > >> Usually, this is an indication of bad system state either on remote > > > >> processors in the SoC or in Linux itself. On a smooth sailing build you > > > >> should not see this 'warning'. > > > >> > > > >> >Would you be fine with moving this message to a pr_debug? Its currently > > > >> >a pr_info_ratelimited() > > > >> I would rather not, moving this out of sight will mask a lot serious > > > >> issues that otherwise bring attention to the developers. > > > >> > > > > > > > >I removed this warning message in my patch posted to the list[1]. If > > > >it's a serious problem then I suppose a timeout is more appropriate, on > > > >the order of several seconds or so and then a pr_warn() and bail out of > > > >the async call with an error. > > > > > > > The warning used to capture issues that happen within a second and it > > > helps capture system related issues. Timing out after many seconds > > > overlooks the system issues that generally tend to resolve itself, but > > > nevertheless need to be investigated. > > > > > > > Is it correct to read "system related issues" as performance problems > > where the thread is spinning forever trying to send a message and it > > can't? So the problem is mostly that it's an unbounded amount of time > > before the message is sent to rpmh and this printk helps identify those > > situations where that is happening? > > > Yes, but mostly a short period of time like when other processors are in > the middle of a restart or resource states changes have taken unusual > amounts of time. The system will generally recover from this without > crashing in this case. User action is investigation of the situation > leading to these messages. > Given that these messages shows up from time and seemingly is harmless, users such as myself implements the action of ignoring these printouts. In the cases I do see these messages it seems, as you say, to be related to something happening in the firmware. So it's not something that a user typically could investigate/debug anyways. As such I do second Doug's request of not printing what looks like error messages unless there is a persistent problem - but provide some means for the few who would find them useful.. Regards, Bjorn > > Otherwise as you say above it's a bad system state where the rpmh > > processor has gotten into a bad state like a crash? Can we recover from > > that? Or is the only recovery a reboot of the system? Does the rpmh > > processor reboot the system if it crashes? > We cannot recover from such a state. The remote processor will reboot if > it detects a failure at it's end. If the system entered a bad state, it > is possible that RPMH requests start timing out in Linux and remote > processor may not detect it. Hence, the timeout in rpmh_write() API. The > advised course of action is a restart as there is no way to recover from > this state. > > --Lina > >