Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> Am 17.01.2023 um 22:30 schrieb Chris Murphy <lists@xxxxxxxxxxxxxxxxx>:
> 
> 
> 
> On Tue, Jan 17, 2023, at 11:51 AM, Peter Boy wrote:
>>> Am 16.01.2023 um 13:23 schrieb Lennart Poettering <mzerqung@xxxxxxxxxxx>:
>>> 
>>> Just to say this cleary btw: when we introduced the time-out initially
>>> we were coming from sysvinit where no such time-out existed at
>>> all. Hence we picked a conservative (i.e. overly long) value to not
>>> upset things too badly. And yes, some people were very much upset we
>>> now defaulted to a time-out.
>>> 
>>> If we'd start from scratch without sysvinit heritage, I think we
>>> would have started with something much much lower right-away.
>> 
>> When introducing a timeout, you obviously had the grace to choose a 
>> fairly conservative  (i.e. cautious) default value that did not lead to 
>> major problems. Would be interesting what would have been if you had 
>> started with 15 sec.
> 
> Why? it was 0 sec before systemd.

As far as I understood Lennart, there was no timeout in Sys V that killed a hanging process. But that is not the relevant point. 

> If anything, the time out behavior is masking problems with services not shutting down in a timely manner.

It's not necessarily that. It is only one of at least 2 possibilities. 

One possibility is indeed that a service "hangs" and therefore does not terminate in a timely manner. This is then a bug or inappropriate programming in the service. And there is no point in waiting for this service, you have to abort, the sooner the better. 

The other possibility, especially on a highly loaded server, is that processes impede each other in the special situation of a shutdown and resource bottleneck resp. resource concurrency. And this is not dependent on the individual service, but on the multitude of services and their interdependencies. This process is not determined and is randomly driven. The time required for a single event, i.e. an individual shutdown, is not predictable. At best, one can approximate a range. If the range is exceeded, the assumption of a non-faulty flow becomes increasingly improbable and there is no point in waiting for any service anymore. No more improvement can be expected. You have to abort.

Unfortunately, we have no data in this case, only different "feelings". We can't estimate a plausible range, we can only kind of guess. And in the case of a server, we might be accept to wait a little longer in light of potential, major follow-on issues.  

So, the current decision is not optimal, but OK and manageable.



>> The way it is proposed it doesn’t make a lot of sense. Desktops and 
>> servers work very differently and have different requirements. For 
>> servers, this proposal in its present form makes no sense at all, and 
>> is on the contrary dangerous.
> 
> Why? It's been said in this thread that servers come with a higher expectation of rebooting upon request rather than indefinitely hanging, in contrast to desktops where there can be some tolerance for delay in exchange for safety.

Maybe I don’t fully understand this due to translation issues. On a server, a reboot is a rare event. Optimally it is up 24/7/365. If I suffer the misfortune of having to reboot the server, it doesn't matter if it's 45 sec, 2 min or 5 min. All important services are redundant, there is no total failure. And the startup BIOS processing often takes longer than any (regular) shutdown process. So, if I have 15 sec timeout instead of 2 mins, is no noticeable improvement. The most important thing is to get back up without any damage. 


> What I've seen on Fedora Server when there are services that hold things up is invariably sshd does immediately quit so now I can't even log back in to find out what's holding up the reboot. It's quite substantially a worse Ux than on the desktop. I mean, ostensibly I know what I'm doing on my own server and don't need to be second guessed like a desktop user.

Yes, it's pretty annoying that ssh always reliably stops immediately, unlike all other processes. It would be most helpful if systemd would terminate ssh last. 

> At least postgresql and libvirtd are configured to inhibit reboot/shutdown indefinitely until they properly quit. Services can opt into this behavior, overriding the default. But indefinite delay would  pose a bigger problem on server than on desktops, due to the loss of any feedback and control.

Agreed. Nobody voted for an indefinite delay, as far as I have read the posts. It's all about how long who is willing to wait and about the relevance of possible damages.  





--
Peter Boy
https://fedoraproject.org/wiki/User:Pboy
pboy@xxxxxxxxxxxxxxxxx

Timezone: CET (UTC+1) / CEST (UTC+2)


Fedora Server Edition Working Group member
Fedora docs team contributor
Java developer and enthusiast


_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux