Hi Shuah, > On Sep 20, 2024, at 8:07 AM, Shuah Khan <skhan@xxxxxxxxxxxxxxxxxxx> wrote: > > On 9/19/24 17:09, Yifei Liu wrote: >> Hi Shuah, >>> On Sep 19, 2024, at 11:36 AM, Shuah Khan <skhan@xxxxxxxxxxxxxxxxxxx> wrote: >>> >>> On 9/17/24 16:48, Yifei Liu wrote: >>>> We recently notice that the step_after_suspend_test would >>>> fail on our plenty devices. The test believesit failed to >>> >>> What are "plenty devices" >>> recently noticed? >> We have seen this issue on more than one platforms, including bare metal device and virtual machine. >>> Typo - believesit? >> Yes, missing a space here. It should be >> The test believes itself ffailed to enter suspend state. >> Thank you for pointing out. >>> >>>> enter suspend state with >>>> $ sudo ./step_after_suspend_test >>>> TAP version 13 >>>> Bail out! Failed to enter Suspend state >>>> However, in the kernel message, I indeed see the system get >>>> suspended and then wake up later. >>>> [611172.033108] PM: suspend entry (s2idle) >>>> [611172.044940] Filesystems sync: 0.006 seconds >>>> [611172.052254] Freezing user space processes >>>> [611172.059319] Freezing user space processes completed (elapsed 0.001 seconds) >>>> [611172.067920] OOM killer disabled. >>>> [611172.072465] Freezing remaining freezable tasks >>>> [611172.080332] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) >>>> [611172.089724] printk: Suspending console(s) (use no_console_suspend to debug) >>>> [611172.117126] serial 00:03: disabled >>>> --- some other hardware get reconnected --- >>>> [611203.136277] OOM killer enabled. >>>> [611203.140637] Restarting tasks ... >>>> [611203.141135] usb 1-8.1: USB disconnect, device number 7 >>>> [611203.141755] done. >>>> [611203.155268] random: crng reseeded on system resumption >>>> [611203.162059] PM: suspend exit >>>> After investigation, I notice that for the code block >>>> if (write(power_state_fd, "mem", strlen("mem")) != strlen("mem")) >>>> ksft_exit_fail_msg("Failed to enter Suspend state\n"); >>>> The write will return -1 and errno is set to 16 (device busy). >>>> It should be caused by the write function is not successfully returned >>>> before the system suspend and the return value get messed when waking up. >>>> As a result, It may be better to check the time passed of those few instructions >>>> to determine whether the suspend is executed correctly for it is pretty hard to >>>> execute those few lines for 4 seconds, or even more if it is not long enough. >>> >>> I don't think this is the right fix. Can you change this to do echo instead. >>> It does the same thing, but it goes through sysfs interface instead of direct >>> write: >>> >>> ret = system("echo mem > /sys/power/state”); >> Sure, I can do that. >>> >>>> Fixes: bfd092b8c2728 ("selftests: breakpoint: add step_after_suspend_test") >>>> Reported-by: Sinadin Shan <sinadin.shan@xxxxxxxxxx> >>>> Signed-off-by: Yifei Liu <yifei.l.liu@xxxxxxxxxx> >>>> --- >>>> .../selftests/breakpoints/step_after_suspend_test.c | 9 +++++++-- >>>> 1 file changed, 7 insertions(+), 2 deletions(-) >>>> diff --git a/tools/testing/selftests/breakpoints/step_after_suspend_test.c b/tools/testing/selftests/breakpoints/step_after_suspend_test.c >>>> index dfec31fb9b30d..d615f091e5bae 100644 >>>> --- a/tools/testing/selftests/breakpoints/step_after_suspend_test.c >>>> +++ b/tools/testing/selftests/breakpoints/step_after_suspend_test.c >>>> @@ -18,6 +18,7 @@ >>>> #include <sys/timerfd.h> >>>> #include <sys/types.h> >>>> #include <sys/wait.h> >>>> +#include <time.h> >>>> #include "../kselftest.h" >>>> @@ -133,6 +134,7 @@ void suspend(void) >>>> int timerfd; >>>> int err; >>>> struct itimerspec spec = {}; >>>> + clock_t t; >>>> if (getuid() != 0) >>>> ksft_exit_skip("Please run the test as root - Exiting.\n"); >>>> @@ -152,8 +154,11 @@ void suspend(void) >>>> if (err < 0) >>>> ksft_exit_fail_msg("timerfd_settime() failed\n"); >>> >>> I don't think you will need to add clock() code. timerfd_settime() >>> sets the time for 5 seconds and you can simply extend the alarm >>> time. >>> >>> There needs to be some logic to check timer elapse and poll the >>> timer_fd >> Yes, it is also a choice to check the remaining time of the timer. We could use timerfd_gettime() to get the remaining time. The timer would not rearm because the it_interval value is not set (set to 0). Then if the remaining time has both 0 or seconds and nano-seconds (tv_sec and tv_nsec), it should point out the time goes for at least 5 seconds. If the system fails to enter suspend state and waked up by the timer, it should not take 5 whole seconds or more to get to the check line. > > Right, You will have to check the remaining time. There is an example in > tools/testing/selftests/timers/alarmtimer-suspend.c you can refer to. I checked the remaining value, if the suspend succeed, it would be all 0 (0s and 0 ns). If it failed to enter suspend by any chance, it would be a non-zero value. > >> Would you prefer this methods? > > I am curious what happens with starting suspend with sysfs first and then > explore checking elapsed timer. > > system("echo mem > /sys/power/state”); I checked this, and find that it will give a similar failure. sh: line 1: echo: write error: Device or resource busy The console command will get its return result after the suspend and wake up. Maybe we could do system("echo mem > /sys/power/state >/dev/nul”); To avoid an error message in the console. And we are checking the success of suspend by the time, so it would not be necessary for us to see the output of this console command. > > thanks, > -- Shuah Yifei