Hi!
I had some similar issue some time ago on VMware. Try turning off the hardware offloading in the VM with ethtool - no outage. Maybe it helps... I think something like ethtool -k and then rx/tx/gso.
-of (mobile)
So, last weekend we rebooted bvirthost09 and db-koji01.
It helped somewhat.
Database dumps are back to a reasonable few hours.
However, it's still got high load and occasionally alerts and also now
it's sometimes causing builders to stop talking to the hub. (They
timeout and just stop checking in).
I've asked netapp folks to look and see if they can see any problems
with the iscsi lun that guest is on, but they say they are not aware of
any issues.
I do see some packet dropping on db-koji01:
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
link/ether 52:54:00:06:90:a4 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped overrun mcast
65269562146 369724151 0 128685 0 0
TX: bytes packets errors dropped carrier collsns
395224051163 377221287 0 0 0 0
My only ideas at this point:
a) run another postgresql vacuum analyze. Perhaps the first one made
some poor choices and another one would make things happier. In any
case that shouldn't make things any worse.
b) Switch the network card on db-koji01 to e1000 instead of virtio-net.
This really shouldn't be needed, but perhaps we are hitting some weird
virtio-net bug. This would require a short outage.
c) Some other brilliant idea. ;)
kevin
_______________________________________________
infrastructure mailing list
infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
<inline.txt>
|
_______________________________________________
infrastructure mailing list
infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/infrastructure