Outage Report: ibiblio02.fedoraproject.org/proxy04.fedoraproject.org

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



1. What happened?

A new hardware server was put into our sponsor site Ibiblio
(ibiblio.org) and we were able to move servers around to a) take down
ibiblio01.fedoraproject.org and b) rebuild ibiblio02 to RHEL-7.

During this rebuild the proxy04.fedoraproject.org server was moved
according to the following steps:
 a. take proxy04.fedoraproject.org out of dns
 b. turn off old proxy04 on ibiblio01
 c. build proxy04 on new server.
 d. put proxy04 back into DNS.

Step c had multiple problems that needed to be debugged which extended
the outage time beyond the original 2 hours. Problem one was that the
host system had come up thinking it was inside of PHX2 and was trying
to reach iscsi server inside. Problem two was that the configuration
file referenced the wrong kickstart which caused additional downtime.
Problem three ended up with some ordering problems in the general
ansible playbook causing services to try to start before config files
were copied over. Problem four was a new python-fedora was pushed out
requiring a python-six which isn't in EL7 yet. These added up to going
over the timeframe originally allocated.

The final outage problem occurred after step D had been in place. Some
files on the system had the wrong permissions and were causing parts
of the website to be non-functional. This required taking the server
out of DNS, redoing permissions and rebooting. At this time additional
CPU and memory were added in case this was helpful.

2. What was affected?

Some users experienced bad websites while the http server said it was
up but was not able to serve pages.

3. How long was the outage?

Rebuild of server was 4 hours due to rebuild problems.

4. What was the root cause?

Multiple papercut problems extended the outage. A possible solution to
those would be to run a continual deployment mode of building a dev
proxy system, but that is outside the scope of this email.

A permissions problem on files caused them not to be readable. Running
a script to fix permissions fixed the problem.

-- 
Stephen J Smoogen.



[Index of Archives]     [Fedora Development]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]

  Powered by Linux