Surface Pro 3 is a typical platform where suspend/resume loop problem can be seen. The problem is due to a systemd 229 bug: 1. "ignore": always can trigger endless suspend/resume loop 2. "open": sometimes suspend/resume loop can be stopped 3. "method": always can trigger endless susped/resume loop The buggy systemd unexpectedly waits for an explicit "open" event after boot/resume or it will suspends. However even when kernel can send a faked "open" to it, its state machine is still wrong, systemd may not respond "close" events arrived after "open" or may suddenly suspend without seeing any instant events. Recent systemd 233 has fixed this issue: 1. "ignore": everything works fine; 2. "open": no suspend/resume cycle, but sometimes cannot suspend the platform again after the first resume; 3. "method": no suspend/resume cycle, but always cannot suspend the platform again after the first resume. The conclusion is: for suspend/resume cycle issue, "ignore" mode fixes everything, but current "method" mode is still buggy. The differences are due to button driver only implements complement switch events for "ignore" mode. Without complement switch events, firmware triggered "close" cannot be delivered to userspace (confirmed by evemu-record). The root cause of the lid state issues is the variation of the platform firmware implementations: 1. Some platforms send "open" events to OS and the events arrive before button driver is resumed; 2. Some platforms send "open" events to OS, but the events arrive after button driver is resumed, ex., Samsung N210+; 3. Some platforms never send "open" events to OS, but send "open" events to update the cached _LID return value, and the update events arrive before button driver is resumed; 4. Some platforms never send "open" events to OS, but send "open" events to update the cached _LID return value, but the update events arrive after button driver is resumed, ex., Surface Pro 3; 5. Some platforms never send "open" events, _LID returns value sticks to "close", ex., Surface Pro 1. Let's check the docking external display issues (see links below): 1. For case 1, both "method"/"ignore" modes can work correctly; 2. For case 2/4/5, both "method"/"ignore" modes cannot work correctly; 3. For case 3, "method" can work correctly while "ignore" mode cannot. The conclusion is: for docking external display issue, though the issue still needs graphics layer (graphics drivers or desktop managers) to be improved to ensure no breakages for case 2/4/5 platforms, there is a case where "method" mode plays better. Thus ACPI subsystem has been pushed to revert back to "method" mode due to regression rule and case 3 (platforms reported on the links should all be case 3 platforms), and libinput developers have volunteered to help to provide workarounds when graphics layer is not fixed or systemd is not updated. Thus this patch extends the complement switch event support to other modes using new indication: generating complement switch event for BIOS notified "close". So that when button driver is reverted back to "method" mode, it won't act worse than "ignore" mode on fixed systemd. Tested with systemd 233, all modes worked fine (no suspend/resume loop and can suspend any times) after applying this patch. Link: https://bugzilla.kernel.org/show_bug.cgi?id=195455 https://bugzilla.redhat.com/show_bug.cgi?id=1430259 Cc: <systemd-devel@xxxxxxxxxxxxxxxxxxxxx> Cc: Benjamin Tissoires <benjamin.tissoires@xxxxxxxxxx> Cc: Peter Hutterer <peter.hutterer@xxxxxxxxx> Signed-off-by: Lv Zheng <lv.zheng@xxxxxxxxx> --- drivers/acpi/button.c | 116 +++++++++++++++++++++++++------------------------- 1 file changed, 57 insertions(+), 59 deletions(-) diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c index 725a15a..36485cf 100644 --- a/drivers/acpi/button.c +++ b/drivers/acpi/button.c @@ -108,6 +108,7 @@ struct acpi_button { unsigned long pushed; int last_state; ktime_t last_time; + bool last_is_bios; bool suspended; }; @@ -144,78 +145,71 @@ static int acpi_lid_notify_state(struct acpi_device *device, struct acpi_button *button = acpi_driver_data(device); int ret; ktime_t next_report; - bool do_update; /* - * In lid_init_state=ignore mode, if user opens/closes lid - * frequently with "open" missing, and "last_time" is also updated - * frequently, "close" cannot be delivered to the userspace. - * So "last_time" is only updated after a timeout or an actual - * switch. + * Ignore frequently replayed switch events. + * + * AML tables can put Notify(LID, xxx) in a notification method, + * and handling the hardware events by executing the entry methods + * (ex., _Qxx) may cause the notification method to be invoked + * several times. + * This check doesn't apply to the faked events because if a BIOS + * notification comes after a faked event, it must pass this check + * in order to be reliablely delivered to user space. */ - if (lid_init_state != ACPI_BUTTON_LID_INIT_IGNORE || - button->last_state != !!state) - do_update = true; - else - do_update = false; - next_report = ktime_add(button->last_time, ms_to_ktime(lid_report_interval)); - if (button->last_state == !!state && - ktime_after(ktime_get(), next_report)) { + if (button->last_is_bios && button->last_state == !!state && + !ktime_after(ktime_get(), next_report)) + return 0; + + /* + * Send the unreliable complement switch event: + * + * On most platforms, the lid device is reliable. However there are + * exceptions: + * 1. Platforms returning initial lid state as "close" by default + * after booting/resuming: + * https://bugzilla.kernel.org/show_bug.cgi?id=89211 + * https://bugzilla.kernel.org/show_bug.cgi?id=106151 + * 2. Platforms never reporting "open" events: + * https://bugzilla.kernel.org/show_bug.cgi?id=106941 + * On these buggy platforms, the usage model of the ACPI lid device + * actually is: + * 1. The initial returning value of _LID may not be reliable. + * 2. The open event may not be reliable. + * 3. The close event is reliable. + * + * But SW_LID is typed as input switch event, the input layer + * checks if the event is redundant. Hence if the state is not + * switched, the userspace cannot see this platform triggered + * reliable event. By inserting a complement switch event, it then + * is guaranteed that the platform triggered reliable one can + * always be seen by the userspace. + */ + if (button->last_state == !!state && is_bios_event) { /* Complain the buggy firmware */ pr_warn_once("The lid device is not compliant to SW_LID.\n"); /* - * Send the unreliable complement switch event: - * - * On most platforms, the lid device is reliable. However - * there are exceptions: - * 1. Platforms returning initial lid state as "close" by - * default after booting/resuming: - * https://bugzilla.kernel.org/show_bug.cgi?id=89211 - * https://bugzilla.kernel.org/show_bug.cgi?id=106151 - * 2. Platforms never reporting "open" events: - * https://bugzilla.kernel.org/show_bug.cgi?id=106941 - * On these buggy platforms, the usage model of the ACPI - * lid device actually is: - * 1. The initial returning value of _LID may not be - * reliable. - * 2. The open event may not be reliable. - * 3. The close event is reliable. - * - * But SW_LID is typed as input switch event, the input - * layer checks if the event is redundant. Hence if the - * state is not switched, the userspace cannot see this - * platform triggered reliable event. By inserting a - * complement switch event, it then is guaranteed that the - * platform triggered reliable one can always be seen by - * the userspace. + * Do not generate complement switch event for "open" + * events - faking "close" events can trigger unexpected + * behaviors. + * Thus only generate complement switch event for BIOS + * notified "close". */ - if (lid_init_state == ACPI_BUTTON_LID_INIT_IGNORE) { - do_update = true; - /* - * Do generate complement switch event for "close" - * as "close" is reliable and wrong "open" won't - * trigger unexpected behaviors. - * Do not generate complement switch event for - * "open" as "open" is not reliable and wrong - * "close" will trigger unexpected behaviors. - */ - if (!state) { - input_report_switch(button->input, - SW_LID, state); - input_sync(button->input); - } + if (!state) { + input_report_switch(button->input, SW_LID, state); + input_sync(button->input); } } + /* Send the platform triggered reliable event */ - if (do_update) { - input_report_switch(button->input, SW_LID, !state); - input_sync(button->input); - button->last_state = !!state; - button->last_time = ktime_get(); - } + input_report_switch(button->input, SW_LID, !state); + input_sync(button->input); + button->last_state = !!state; + button->last_time = ktime_get(); + button->last_is_bios = is_bios_event; if (state) pm_wakeup_hard_event(&device->dev); @@ -444,6 +438,8 @@ static int acpi_button_resume(struct device *dev) struct acpi_button *button = acpi_driver_data(device); button->suspended = false; + /* ignore replay frequency check between suspend/resume */ + button->last_is_bios = false; if (button->type == ACPI_BUTTON_TYPE_LID) acpi_lid_initialize_state(device); return 0; @@ -492,6 +488,8 @@ static int acpi_button_add(struct acpi_device *device) ACPI_BUTTON_CLASS, ACPI_BUTTON_SUBCLASS_LID); button->last_state = !!acpi_lid_evaluate_state(device); button->last_time = ktime_get(); + /* ignore replay frequency check after boot */ + button->last_is_bios = false; } else { printk(KERN_ERR PREFIX "Unsupported hid [%s]\n", hid); error = -ENODEV; -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html