Stewart Smith <stewart@xxxxxxxxxxxxxxxxxx> writes: > According to the OPAL docs: > https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-read-3.txt > https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-write-4.txt > OPAL_HARDWARE may be returned from OPAL_RTC_READ or OPAL_RTC_WRITE and this > indicates either a transient or permanent error. > > Prior to this patch, Linux was not dealing with OPAL_HARDWARE being a > permanent error particularly well, in that you could end up in a busy > loop. > > This was not too hard to trigger on an AMI BMC based OpenPOWER machine > doing a continuous "ipmitool mc reset cold" to the BMC, the result of > that being that we'd get stuck in an infinite loop in opal_get_rtc_time. > > We now retry a few times before returning the error higher up the stack. Looks like this has always been broken, so: Fixes: 16b1d26e77b1 ("rtc/tpo: Driver to support rtc and wakeup on PowerNV platform") > Cc: stable@xxxxxxxxxxxxxxx And therefore that should be: Cc: stable@xxxxxxxxxxxxxxx # v3.19+ > Signed-off-by: Stewart Smith <stewart@xxxxxxxxxxxxxxxxxx> > --- > drivers/rtc/rtc-opal.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/drivers/rtc/rtc-opal.c b/drivers/rtc/rtc-opal.c > index 9c18d6fd8107..fab19e3e2fba 100644 > --- a/drivers/rtc/rtc-opal.c > +++ b/drivers/rtc/rtc-opal.c > @@ -58,6 +58,7 @@ static void tm_to_opal(struct rtc_time *tm, u32 *y_m_d, u64 *h_m_s_ms) > static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm) > { > long rc = OPAL_BUSY; > + int retries = 10; > u32 y_m_d; > u64 h_m_s_ms; > __be32 __y_m_d; > @@ -67,8 +68,11 @@ static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm) > rc = opal_rtc_read(&__y_m_d, &__h_m_s_ms); > if (rc == OPAL_BUSY_EVENT) > opal_poll_events(NULL); > - else > + else if (retries-- && (rc == OPAL_HARDWARE > + || rc == OPAL_INTERNAL_ERROR)) > msleep(10); > + else if (rc != OPAL_BUSY && rc != OPAL_BUSY_EVENT) > + break; > } This is a pretty gross API at this point. That's basically a score of 2 on Rusty's API usability index ("Read the implementation and you'll get it right" - http://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html). The docs don't mention OPAL_INTERNAL_ERROR being transient, nor do they mention OPAL_BUSY. Can we at least do a wrapper function in opal.h for drivers to use that handles some or all of these cases? cheers -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html