On 04/12/2017 10:44 AM, Eduardo Valentin wrote: > Hello, > ... > > I agree. But there it nothing that says it is not reenterable. If you > saw something in this line, can you please share? > >>>> will you generate a patch to do this? >>> Sure. I will generate a patch to take care of 1) To make sure that >>> orderly_poweroff is called only once right away. I have already >>> tested. >>> >>> for 2) Cancel all the scheduled work queues to monitor the >>> temperature. >>> I will take some more time to make it and test. >>> >>> Is that okay? Or you want me to send both together? >>> >> I think you can send patch for step 1 first. > > I am happy to see that Keerthy found the problem with his setup and a > possible solution. But I have a few concerns here. > > 1. If regular shutdown process takes 10seconds, that is a ballpark that > thermal should never wait. orderly_poweroff() calls run_cmd() with wait > flag set. That means, if regular userland shutdown takes 10s, we are > waiting for it. Obviously this not acceptable. Specially if you setup > critical trip to be 125C. Now, if you properly size the critical trip to > fire before hotspot really reach 125C, for 10s (or the time it takes to > shutdown), then fine. But based on what was described in this thread, > his system is waiting 10s on regular shutdown, and his silicon is on > out-of-spec temperature for 10s, which is wrong. > > 2. The above scenario is not acceptable in a long run, specially from a > reliability perspective. If orderly_poweroff() has a possibility to > simply never return (or take too long), I would say the thermal > subsystem is using the wrong API. > Hh, I do not see that orderly_poweroff() will wait for anything now: void orderly_poweroff(bool force) { if (force) /* do not override the pending "true" */ poweroff_force = true; schedule_work(&poweroff_work); ^^^^^^^ async call. even here can be pretty big delay if system is under pressure } static int __orderly_poweroff(bool force) { int ret; ret = run_cmd(poweroff_cmd); ^^^^ no wait for the process - only for exec. flags == UMH_WAIT_EXEC if (ret && force) { pr_warn("Failed to start orderly shutdown: forcing the issue\n"); /* * I guess this should try to kick off some daemon to sync and * poweroff asap. Or not even bother syncing if we're doing an * emergency shutdown? */ emergency_sync(); kernel_power_off(); ^^^ force power off, but only if run_cmd() failed - for example /sbin/poweroff doesn't exist } return ret; } static bool poweroff_force; static void poweroff_work_func(struct work_struct *work) { __orderly_poweroff(poweroff_force); } As result thermal has no control of power off any more after calling orderly_poweroff() and can get the result of US poweroff binary execution. > > If you are going to implement the above two patches, keep in mind: > i. At least within the thermal subsystem, you need to take care of all > zones that could trigger a shutdown. > ii. serializing the calls to orderly_poweroff() seams to be more > concerning than cancelling all monitoring. > > -- regards, -grygorii -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html