Re: fedora cloud AWS randomly shutdown

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/26/18 5:03 AM, Neal Becker wrote:
> Rick Stevens wrote:
> 
>> On 9/25/18 12:32 PM, Neal Becker wrote:
>>> I'm using f28 cloud on AWS as a compute farm.  It seems that instances
>>> randomly shutdown within hours of starting.  An example log:
>>>
>>> ...
>>> Fedora 28 (Cloud Edition)
>>> Kernel 4.16.3-301.fc28.x86_64 on an x86_64 (ttyS0)
>>>
>>>          Stopping Restore /run/initramfs on shutdown...
>>> [  OK  ] Removed slice system-sshd\x2dkeygen.slice.
>>>          Stopping User Manager for UID 1000...
>>> ...
>>>
>>> In this case after about 4 hours it seems to have spontaneously shutdown.
>>> This happens with high probability - maybe 2/10 instances I start
>>> spontaneously shutdown.
>>>
>>> Any ideas what's going on?  I'm just wondering if this is something
>>> specific to fedora cloud edition, because it doesn't seem to be a common
>>> complaint on AWS (most of which is ubuntu).
>>
>> Are you getting emails from AWS that they're shutting down your
>> instance? AWS does some testing and, should your instance fail their
>> tests, they will shut it down "to protect others sharing the hardware".
>> If this is what's happening, you should get an email about it (we get
>> one perhaps 20% of the time) and if not, check the AWS admin portal
>> under "Events" right after a restart. There should be a record about it.
>> That record goes away after a while (not sure how long it hangs around).
>>
>> In my experience, AWS is rather vague as to just _what_ tests they use
>> to determine if your instance is dangerous so it can be difficult to fix
>> your code. We've got some AWS stuff that's been up for well over a year,
>> but others they shut down because they fail these mysterious tests.
>>
>> If you're using instance store disks, the disk image is purged when you
>> restart your instance so your logs probably don't contain why the system
>> shut down the last time. The only way to hang onto that stuff is to use
>> persistent (EBC) storage for your machine--at least for the logs (I'd
>> recommend st1-type storage for logs). Persistent storage at AWS can get
>> expensive depending on how big it is, but it may be necessary to sort
>> this out. Once figured out, you can get rid of the EBS storage to
>> minimize costs.
>>
>> This may be a Fedora Cloud issue. It may be something you're doing in an
>> application. It may be AWS protecting itself. Hard to tell.
> 
> Shutdowns occur with very high probability within few hours.  Like, maybe 
> 20% of my machines shutdown within a few hours.  I suspect machines with 
> high load average shutdown.  But that's not behavior I'd expect from fedora 
> workstation!  I'm wondering if there's something about the fedora cloud 
> setup causing this?

Please check the AWS portal and see if they're killing your machines or
if they're shutting down of their own accord. And as I said before,
you may need to set up an EBC st1 storage volume and mount it at
/var/log to persist logs across reboots so you can examine them when you
bring the machine back up.

It might an idea to set up a small AWS instance with the EBC storage at
/var/log as a log server and have all your other instances log to it.
You'd be able to capture any of your AWS instance logs that way on a
single EBC storage volume.
----------------------------------------------------------------------
- Rick Stevens, Systems Engineer, AllDigital    ricks@xxxxxxxxxxxxxx -
- AIM/Skype: therps2        ICQ: 226437340           Yahoo: origrps2 -
-                                                                    -
- Politicians are the opposite of pickpockets because you never see  -
-        them take their hand out of your pocket.                    -
-                                             -- Larry Fine          -
----------------------------------------------------------------------
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx



[Index of Archives]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [EPEL Devel]     [Fedora Magazine]     [Fedora Summer Coding]     [Fedora Laptop]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Desktop]     [Fedora Fonts]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Yosemite News]     [Gnome Users]     [KDE Users]     [Fedora Art]     [Fedora Docs]     [Fedora Sparc]     [Libvirt Users]     [Fedora ARM]

  Powered by Linux