Rick Stevens wrote: > On 9/25/18 12:32 PM, Neal Becker wrote: >> I'm using f28 cloud on AWS as a compute farm. It seems that instances >> randomly shutdown within hours of starting. An example log: >> >> ... >> Fedora 28 (Cloud Edition) >> Kernel 4.16.3-301.fc28.x86_64 on an x86_64 (ttyS0) >> >> Stopping Restore /run/initramfs on shutdown... >> [[0;32m OK [0m] Removed slice system-sshd\x2dkeygen.slice. >> Stopping User Manager for UID 1000... >> ... >> >> In this case after about 4 hours it seems to have spontaneously shutdown. >> This happens with high probability - maybe 2/10 instances I start >> spontaneously shutdown. >> >> Any ideas what's going on? I'm just wondering if this is something >> specific to fedora cloud edition, because it doesn't seem to be a common >> complaint on AWS (most of which is ubuntu). > > Are you getting emails from AWS that they're shutting down your > instance? AWS does some testing and, should your instance fail their > tests, they will shut it down "to protect others sharing the hardware". > If this is what's happening, you should get an email about it (we get > one perhaps 20% of the time) and if not, check the AWS admin portal > under "Events" right after a restart. There should be a record about it. > That record goes away after a while (not sure how long it hangs around). > > In my experience, AWS is rather vague as to just _what_ tests they use > to determine if your instance is dangerous so it can be difficult to fix > your code. We've got some AWS stuff that's been up for well over a year, > but others they shut down because they fail these mysterious tests. > > If you're using instance store disks, the disk image is purged when you > restart your instance so your logs probably don't contain why the system > shut down the last time. The only way to hang onto that stuff is to use > persistent (EBC) storage for your machine--at least for the logs (I'd > recommend st1-type storage for logs). Persistent storage at AWS can get > expensive depending on how big it is, but it may be necessary to sort > this out. Once figured out, you can get rid of the EBS storage to > minimize costs. > > This may be a Fedora Cloud issue. It may be something you're doing in an > application. It may be AWS protecting itself. Hard to tell. Shutdowns occur with very high probability within few hours. Like, maybe 20% of my machines shutdown within a few hours. I suspect machines with high load average shutdown. But that's not behavior I'd expect from fedora workstation! I'm wondering if there's something about the fedora cloud setup causing this? _______________________________________________ users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx