Dear all, we run a small cluster [1] that is exclusively used for virtualisation (kvm/libvirt). Recently we started to run into performance problems (slow requests, failing OSDs) for no *obvious* reason (at least not for us). We do nightly snapshots of VM images and keep the snapshots for 14 days. Currently we run 8 VMs in the cluster. At first it looked like the problem was related to snapshotting images of VMs that were up and running (respectively deleting the snapshots after 14 days). So we changed the procedure to first suspend the VM and the snapshot its image(s). Snapshots are made at 4 am. When we removed *all* the old snapshots (the ones done of running VMs) the cluster suddenly behaved 'normal' again, but after two days of creating snapshots (not deleting any) of suspended VMs, the slow requests started again (although by far not as frequent as before). This morning we experienced subsequent failures (e.g. osd.2 IPv4:6800/1621 failed (2 reporters from different host after 49.976472 >= grace 46.444312) of 4 of our 6 OSDs, resulting in HEALTH_WARN with up to about 20% of PGs active+undersized+degraded or stale+active+clean or remapped+peering. No OSD failure lasted longer than 4 minutes. After 15 minutes everything was back to normal again. The noise started at 6:25 am, a time when cron.daily scripts run here. We have no clue what could have caused this behavior :( There seems to be no shortage of resources (CPU, RAM, network) that would explain what happened, but maybe we did not look in the right places. So any hint on where to look/what to look for would be greatly appreciated :) [1] cluster setup Three nodes: ceph1, ceph2, ceph3 ceph1 and ceph2 1x Intel(R) Xeon(R) CPU E3-1275 v3 @ 3.50GHz 32 GB RAM RAID1 for OS 1x Intel 530 Series SSDs (120GB) for Journals 3x WDC WD2500BUCT-63TWBY0 for OSDs (1TB) 2x Gbit Ethernet bonded (802.3ad) on HP 2920 Stack ceph3 virtual machine 1 CPU 4 GB RAM Software Debian GNU/Linux Jessie (8.7) Kernel 3.16 ceph 10.2.6 (656b5b63ed7c43bd014bcafd81b001959d5f089f) Ceph Services 3 Monitors: ceph1, ceph2, ceph3 6 OSDs: ceph1 (3), ceph2 (3) Regards, -- J.Hofmüller Nisiti - Abie Nathan, 1927-2008
Attachment:
signature.asc
Description: This is a digitally signed message part
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com