Lessons learned: Initial check/step list for updates

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 0. Plan time in infrastructure@xxxxxxxxxxxxxxxxxxxxxxxx
 1. Open ticket on infrastructure for downtime.
      Updates will occur during day
      Reboots will occur during evening
 2. Send email to devel-announce, announce, infrastructure
 3. Update servers during working hours and work out issues in ticket.
   ** releng updates the following boxes:
       cvs01, pkgs01, nfs01, bnfs01, bxen*,
       x86-*, ppc*, koji*, db03, xb-01,
       compose-*, sign-vault01
 4. Change DNS to turn off proxy on bodhost01 (or similar external
      proxy server).
 5. Reboot bodhost01
 6. Confirm proxy is working on bodhost/fix issues.
 7. Change proxy dns to only go to bodhost01
 8. Turn off nagios for servers.
 9. Turn off nagios-external for services.

10. Reboot order counts
11.   releng deals with the boxes listed above unless told otherwise.
12.   reboots with database servers first
        xen15: db02
        xen12: db01
13.   reboot PHX2 boxes
        xen03:
        xen04:
        xen06:
        xen07:
        xen09:
        xen10:
        xen11:
        xen13:
        backup01:
14.   reboot Outside boxes (can be in parallel to PHX2)
        cnode01:
        cnode02:
        cnode03:
        ibiblio01:
        internetx01:
        osuosl01:
        people01:
        serverbeach1:
        serverbeach2:
        serverbeach3:
        serverbeach4:
        serverbeach5:
        telia1:
        tummy1:
15.   reboot bastion.fedoraproject.org
        log into bastion1 from outside system
        log into bastion2 from outside world
        log into xen05 from bastion01
        bastion01:
          sudo su /usr/sbin/puppetd --disable
          sudo su /sbin/service openvpn start
        bastion02
          sudo su /sbin/service openvpn start
        xen05
          sudo /sbin/shutdown -r now
        once xen05/bastion2 server is back up, we can
        bastion01:
          sudo su /sbin/service openvpn stop
          sudo su /usr/sbin/puppetd --enable
16.   reboot puppet01
        log into bastion2 from outside world
        ssh xen14
           sudo /sbin/shutdown -r now
17.   re-enable DNS for proxy servers
        test proxy servers from puppet01
        edit dns in git puppet
        make ns1
18.   re-enable nagios on internal/external
19. Setup transifex agent on app servers: app01 app02 app03 app04 app07
     sudo -u transifex /var/lib/transifex/ssh-add.sh -f
20. Log and report problems to list.
21. Close ticket.







-- 
Stephen J Smoogen.
“The core skill of innovators is error recovery, not failure avoidance.”
Randy Nelson, President of Pixar University.
"We have a strategic plan. It's called doing things.""
— Herb Kelleher, founder Southwest Airlines
_______________________________________________
infrastructure mailing list
infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/infrastructure



[Index of Archives]     [Fedora Development]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]

  Powered by Linux