On Thursday 13 April 2017 12:13 AM, Tero Kristo wrote: > On 12/04/17 20:24, Eduardo Valentin wrote: >> On Wed, Apr 12, 2017 at 10:41:00PM +0530, Keerthy wrote: >>> >>> >>> On Wednesday 12 April 2017 10:38 PM, Grygorii Strashko wrote: >>>> >>>> >>>> On 04/12/2017 11:44 AM, Keerthy wrote: >>>>> >>>>> >>>>> On Wednesday 12 April 2017 10:01 PM, Grygorii Strashko wrote: >>>>>> >>>>>> >>>>>> On 04/12/2017 10:44 AM, Eduardo Valentin wrote: >>>>>>> Hello, >>>>>>> >>>>>> ... >>>>>> >>>>>>> >>>>>>> I agree. But there it nothing that says it is not reenterable. If >>>>>>> you >>>>>>> saw something in this line, can you please share? >>>>>>> >>>>>>>>>> will you generate a patch to do this? >>>>>>>>> Sure. I will generate a patch to take care of 1) To make sure that >>>>>>>>> orderly_poweroff is called only once right away. I have already >>>>>>>>> tested. >>>>>>>>> >>>>>>>>> for 2) Cancel all the scheduled work queues to monitor the >>>>>>>>> temperature. >>>>>>>>> I will take some more time to make it and test. >>>>>>>>> >>>>>>>>> Is that okay? Or you want me to send both together? >>>>>>>>> >>>>>>>> I think you can send patch for step 1 first. >>>>>>> >>>>>>> I am happy to see that Keerthy found the problem with his setup >>>>>>> and a >>>>>>> possible solution. But I have a few concerns here. >>>>>>> >>>>>>> 1. If regular shutdown process takes 10seconds, that is a >>>>>>> ballpark that >>>>>>> thermal should never wait. orderly_poweroff() calls run_cmd() >>>>>>> with wait >>>>>>> flag set. That means, if regular userland shutdown takes 10s, we are >>>>>>> waiting for it. Obviously this not acceptable. Specially if you >>>>>>> setup >>>>>>> critical trip to be 125C. Now, if you properly size the critical >>>>>>> trip to >>>>>>> fire before hotspot really reach 125C, for 10s (or the time it >>>>>>> takes to >>>>>>> shutdown), then fine. But based on what was described in this >>>>>>> thread, >>>>>>> his system is waiting 10s on regular shutdown, and his silicon is on >>>>>>> out-of-spec temperature for 10s, which is wrong. >>>>>>> >>>>>>> 2. The above scenario is not acceptable in a long run, specially >>>>>>> from a >>>>>>> reliability perspective. If orderly_poweroff() has a possibility to >>>>>>> simply never return (or take too long), I would say the thermal >>>>>>> subsystem is using the wrong API. >>>> >>>> ^ this question just repeat everything which was already discussed in >>>> previous versions of this patch - orderly_poweroff() is not good for >>>> critical shutdown/poweroff, >>>> but what to use instead? >> >> It is still useful on a properly sized system. The point is the thermal >> subsystem still wants to give one opportunity to gracefully shutdown the >> running system on a thermal scenario, as I explained in the other email. >> But, you have to do this accounting the down time, and your reliability >> concerns. >> >>>> >>>> >>>>>>> >>>>>> >>>>>> >>>>>> Hh, I do not see that orderly_poweroff() will wait for anything now: >>>>>> void orderly_poweroff(bool force) >>>>>> { >>>>>> if (force) /* do not override the pending "true" */ >>>>>> poweroff_force = true; >>>>>> schedule_work(&poweroff_work); >>>>>> ^^^^^^^ async call. even here can be pretty big delay if system is >>>>>> under pressure >>>>>> } >>>>>> >>>>>> >>>>>> static int __orderly_poweroff(bool force) >>>>>> { >>>>>> int ret; >>>>>> >>>>>> ret = run_cmd(poweroff_cmd); >>>>> >>>>> When i tried with multiple orderly_poweroff calls ret was always 0. >>>>> So every 250mS i see this ret = 0. >>>>> >>>>>> ^^^^ no wait for the process - only for exec. flags == UMH_WAIT_EXEC >>>>>> >>>>>> if (ret && force) { >>>>> >>>>> So it never entered this path. ret = 0 so if is not executed. >>>> >>>> correct, because exec can find poweroff tool and start it, so you, >>>> most probably, have bunch of this tool instance running in parallel >>>> (some of them can fail or block) >>>> Issue 1 - you've sent fix for is actual :). >>> >>> Precisely yes! >>> >> >> As I mentioned, the fix is a two fold, a. avoid spam of >> orderly_poweroff(), but make sure eventually we shutdown. > > Just chirping in here a bit myself also, the long latencies in the > poweroff executing are basically because in our case it will do all of > the following: > > - stop all running daemons > - kill all remaining processes > - unload all modules > - sync / unmount all filesystems > - etc. > - poweroff the system when everything else has been gracefully done > > The order of these things are not necessarily what I listed above, but > overall it takes quite a bit of time. It doesn't matter if you execute > all of this over NFS or SD card or ramdisk, it is a long procedure. Yes. Thanks for the pointers Tero. As i had mentioned, Here is the log on DRA72-EVM with arago filesystem over nfs on the next branch with my patch to restrict orderly_poweroff to one cycle. http://pastebin.ubuntu.com/24371980/ I triggered thermal shutdown by using THERMAL_EMULATION. 5-10S was on a good run and we can see that with a full size file system over nfs its taking about 30+ seconds to orderly_poweroff. I also profiled a poweroff command timing. That also takes more than 20 Seconds. Here is the log: http://pastebin.ubuntu.com/24372012/ As Eduardo pointed out this is pretty long. I had 2 suggestions for that: 1) To decrease the thermal critical threshold below the actual hardware thermal shutdown threshold. 2) To have a thermal_backup shutdown which uses kernel_power_off when a configured time expires after we have triggered orderly_poweroff and directly shuts off the system. Regards, Keerthy > > -Tero > >> >>>> >>>> Again, thermal has no control of power off process once run_cmd() >>>> is returned, >>>> and it do not know what US poweroff binary is doing and how much >>>> time can it take >>>> (which include disks maintenance - loooong delay). >>>> >>>>> >>>>>> pr_warn("Failed to start orderly shutdown: forcing the >>>>>> issue\n"); >>>>>> >>>>>> /* >>>>>> * I guess this should try to kick off some daemon to sync >>>>>> and >>>>>> * poweroff asap. Or not even bother syncing if we're >>>>>> doing an >>>>>> * emergency shutdown? >>>>>> */ >>>>>> emergency_sync(); >>>>>> kernel_power_off(); >>>>>> ^^^ force power off, but only if run_cmd() failed - for example >>>>>> /sbin/poweroff doesn't exist >>>>>> } >>>>>> >>>>>> return ret; >>>>>> } >>>>>> >>>>>> static bool poweroff_force; >>>>>> >>>>>> static void poweroff_work_func(struct work_struct *work) >>>>>> { >>>>>> __orderly_poweroff(poweroff_force); >>>>>> } >>>>>> >>>>>> As result thermal has no control of power off any more after >>>>>> calling orderly_poweroff() and can get the result >>>>>> of US poweroff binary execution. >>>>>> >>>>>>> >>>>>>> If you are going to implement the above two patches, keep in mind: >>>>>>> i. At least within the thermal subsystem, you need to take care >>>>>>> of all >>>>>>> zones that could trigger a shutdown. >>>>>>> ii. serializing the calls to orderly_poweroff() seams to be more >>>>>>> concerning than cancelling all monitoring. >>>>>>> >>>>>>> >>>>>> >>>> > -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html