On Mon, Oct 10, 2022 at 7:50 PM Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote: > > On Mon, Oct 10, 2022 at 04:47:50PM +0200, Rafael J. Wysocki wrote: > > Hi Mel, > > > > Thanks for the report! > > > > On Mon, Oct 10, 2022 at 4:25 PM Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote: > > > > > > Hi Rafael, > > > > > > I'm seeing intermittent boot failures after 6492fed7d8c9 ("rtc: rtc-cmos: > > > Do not check ACPI_FADT_LOW_POWER_S0") due to a NULL pointer exception > > > early in boot. It fails to boot 5 times after 10 boot attempts and I've > > > only observed it on one machine so far. Either a revert or the patch below > > > fixes it but it's unlikely it is the correct fix. > > > > > > --- drivers/rtc/rtc-cmos.c.orig 2022-10-10 15:11:50.335756567 +0200 > > > +++ drivers/rtc/rtc-cmos.c 2022-10-10 15:11:53.211756691 +0200 > > > @@ -1209,7 +1209,7 @@ > > > * Or else, ACPI SCI is enabled during suspend/resume only, > > > * update rtc irq in that case. > > > */ > > > - if (cmos_use_acpi_alarm()) > > > + if (cmos_use_acpi_alarm() && cmos) > > > cmos_interrupt(0, (void *)cmos->rtc); > > > else { > > > /* Fix me: can we use cmos_interrupt() here as well? */ > > > > It looks like I've exposed a race condition there. > > > > Generally speaking, it is misguided to install an event handler that > > is not ready to handle the event at that time before making sure that > > the event is disabled. > > > > Does the attached patch help? > > > > It failed 3/10 times. This is still not acceptable. > That's less than the previous 5/10 failures but I > cannot be certain it helped without running a lot more boot tests. The > failure happens in the same function as before. I've overlooked the fact that acpi_install_fixed_event_handler() enables the event on success, so it is a bug to call it when the handler is not ready. It should help to only enable the event after running cmos_do_probe() where the driver data pointer is set, so please try the attached patch.
--- drivers/rtc/rtc-cmos.c | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) Index: linux-pm/drivers/rtc/rtc-cmos.c =================================================================== --- linux-pm.orig/drivers/rtc/rtc-cmos.c +++ linux-pm/drivers/rtc/rtc-cmos.c @@ -1352,7 +1352,7 @@ static void cmos_check_acpi_rtc_status(s static int cmos_pnp_probe(struct pnp_dev *pnp, const struct pnp_device_id *id) { - cmos_wake_setup(&pnp->dev); + int ret; if (pnp_port_start(pnp, 0) == 0x70 && !pnp_irq_valid(pnp, 0)) { unsigned int irq = 0; @@ -1364,13 +1364,20 @@ static int cmos_pnp_probe(struct pnp_dev if (nr_legacy_irqs()) irq = RTC_IRQ; #endif - return cmos_do_probe(&pnp->dev, + ret = cmos_do_probe(&pnp->dev, pnp_get_resource(pnp, IORESOURCE_IO, 0), irq); } else { - return cmos_do_probe(&pnp->dev, + ret = cmos_do_probe(&pnp->dev, pnp_get_resource(pnp, IORESOURCE_IO, 0), pnp_irq(pnp, 0)); } + + if (ret) + return ret; + + cmos_wake_setup(&pnp->dev); + + return 0; } static void cmos_pnp_remove(struct pnp_dev *pnp) @@ -1454,10 +1461,9 @@ static inline void cmos_of_init(struct p static int __init cmos_platform_probe(struct platform_device *pdev) { struct resource *resource; - int irq; + int irq, ret; cmos_of_init(pdev); - cmos_wake_setup(&pdev->dev); if (RTC_IOMAPPED) resource = platform_get_resource(pdev, IORESOURCE_IO, 0); @@ -1467,7 +1473,13 @@ static int __init cmos_platform_probe(st if (irq < 0) irq = -1; - return cmos_do_probe(&pdev->dev, resource, irq); + ret = cmos_do_probe(&pdev->dev, resource, irq); + if (ret) + return ret; + + cmos_wake_setup(&pdev->dev); + + return 0; } static int cmos_platform_remove(struct platform_device *pdev)