Freeze break: db-koji01 and bvirthost09 reboot

Kevin Fenzi <kevin@xxxxxxxxx> · Fri, 10 Apr 2015 11:04:55 -0600

I was going to wait until after freeze for this, but with us slipping a
week I think it might be worth doing now. 

For the last few weeks we have been having issues with db-koji01. 
The problem started when I moved it's backend storage from one iscsi/pv
to another iscsi/pv. The load has been high since then and it's not as
performant as it was. 

Effects: 

* koji alerts in nagios make us need to restart httpd on koji01 (which
  we can do without outage, but means a human has to wake up and go do
  it). 

* If koji01 httpd isn't restarted, kojira sometimes will timeout and
  not launch newrepos. (We worked around this by increasing the
  timeout, but it's only a matter of time before it hits this again). 

* Pages on koji that need lots of db access are slower than they
  were/need to be. 

Cause: 

Not entirely sure what the base cause is. lvdisplay shows the guest is
on the right iscsi volume, there's no iscsi errors or the like. The
host did have stale lvm data due to lvmetad running, but that shouldn't
have affected the running guest(s). I can only think there's something
still trying to hit the old no longer used iscsi volume and causing
extra load. 

What I would like to do: 

* Stop postgres on db-koji01. This will cause the hub to show db down
  to anyone looking. 

* rsync /var/lib/pgsql off to backup03. This should take less than
  10min. 

* shutdown db-koji01 and dhcp01. 

* Reboot bvirthost09 

* See if the issue clears up. If something happens and db-koji01
  doesn't come back up right, we can make a new one and
  sync /var/lib/pgsql back to it and be back up pretty quickly.
  Hopefully it won't come to that. 

I'd like to schedule this possibly over the weekend off hours when koji
isn't all that busy. 

Thoughts? +1s?

kevin
Attachment:
pgpOwUV6B7TjK.pgp

Description: OpenPGP digital signature
_______________________________________________
infrastructure mailing list
infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/infrastructure