So, last weekend we rebooted bvirthost09 and db-koji01. It helped somewhat. Database dumps are back to a reasonable few hours. However, it's still got high load and occasionally alerts and also now it's sometimes causing builders to stop talking to the hub. (They timeout and just stop checking in). I've asked netapp folks to look and see if they can see any problems with the iscsi lun that guest is on, but they say they are not aware of any issues. I do see some packet dropping on db-koji01: 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000 link/ether 52:54:00:06:90:a4 brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped overrun mcast 65269562146 369724151 0 128685 0 0 TX: bytes packets errors dropped carrier collsns 395224051163 377221287 0 0 0 0 My only ideas at this point: a) run another postgresql vacuum analyze. Perhaps the first one made some poor choices and another one would make things happier. In any case that shouldn't make things any worse. b) Switch the network card on db-koji01 to e1000 instead of virtio-net. This really shouldn't be needed, but perhaps we are hitting some weird virtio-net bug. This would require a short outage. c) Some other brilliant idea. ;) kevin
Attachment:
pgpfn8AdQo1IU.pgp
Description: OpenPGP digital signature
_______________________________________________ infrastructure mailing list infrastructure@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/infrastructure