On 9/26/18 5:03 AM, Neal Becker wrote: > Rick Stevens wrote: > >> On 9/25/18 12:32 PM, Neal Becker wrote: >>> I'm using f28 cloud on AWS as a compute farm. It seems that instances >>> randomly shutdown within hours of starting. An example log: >>> >>> ... >>> Fedora 28 (Cloud Edition) >>> Kernel 4.16.3-301.fc28.x86_64 on an x86_64 (ttyS0) >>> >>> Stopping Restore /run/initramfs on shutdown... >>> [[0;32m OK [0m] Removed slice system-sshd\x2dkeygen.slice. >>> Stopping User Manager for UID 1000... >>> ... >>> >>> In this case after about 4 hours it seems to have spontaneously shutdown. >>> This happens with high probability - maybe 2/10 instances I start >>> spontaneously shutdown. >>> >>> Any ideas what's going on? I'm just wondering if this is something >>> specific to fedora cloud edition, because it doesn't seem to be a common >>> complaint on AWS (most of which is ubuntu). >> >> Are you getting emails from AWS that they're shutting down your >> instance? AWS does some testing and, should your instance fail their >> tests, they will shut it down "to protect others sharing the hardware". >> If this is what's happening, you should get an email about it (we get >> one perhaps 20% of the time) and if not, check the AWS admin portal >> under "Events" right after a restart. There should be a record about it. >> That record goes away after a while (not sure how long it hangs around). >> >> In my experience, AWS is rather vague as to just _what_ tests they use >> to determine if your instance is dangerous so it can be difficult to fix >> your code. We've got some AWS stuff that's been up for well over a year, >> but others they shut down because they fail these mysterious tests. >> >> If you're using instance store disks, the disk image is purged when you >> restart your instance so your logs probably don't contain why the system >> shut down the last time. The only way to hang onto that stuff is to use >> persistent (EBC) storage for your machine--at least for the logs (I'd >> recommend st1-type storage for logs). Persistent storage at AWS can get >> expensive depending on how big it is, but it may be necessary to sort >> this out. Once figured out, you can get rid of the EBS storage to >> minimize costs. >> >> This may be a Fedora Cloud issue. It may be something you're doing in an >> application. It may be AWS protecting itself. Hard to tell. > > Shutdowns occur with very high probability within few hours. Like, maybe > 20% of my machines shutdown within a few hours. I suspect machines with > high load average shutdown. But that's not behavior I'd expect from fedora > workstation! I'm wondering if there's something about the fedora cloud > setup causing this? Please check the AWS portal and see if they're killing your machines or if they're shutting down of their own accord. And as I said before, you may need to set up an EBC st1 storage volume and mount it at /var/log to persist logs across reboots so you can examine them when you bring the machine back up. It might an idea to set up a small AWS instance with the EBC storage at /var/log as a log server and have all your other instances log to it. You'd be able to capture any of your AWS instance logs that way on a single EBC storage volume. ---------------------------------------------------------------------- - Rick Stevens, Systems Engineer, AllDigital ricks@xxxxxxxxxxxxxx - - AIM/Skype: therps2 ICQ: 226437340 Yahoo: origrps2 - - - - Politicians are the opposite of pickpockets because you never see - - them take their hand out of your pocket. - - -- Larry Fine - ---------------------------------------------------------------------- _______________________________________________ users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx