I can almost guarantee what you're seeing is PG subfolder splitting. When the subfolders in a PG get X number of objects, it splits into 16 subfolders. Every cluster I manage has blocked requests and OSDs that get marked down while this is happening. To stop the OSDs getting marked down, I increase the osd_heartbeat_grace until the OSDs no longer mark themselves down during this process. Based on your email, it looks like starting at 5 minutes would be a good place. The blocked requests will still persist, but the OSDs aren't being marked down regularly and adding peering to the headache.
In 10.2.5 and 0.94.9, there was a way to take an OSD offline and tell it to split the subfolders of its PGs. I haven't done this yet, myself, but plan to figure it out the next time I come across this sort of behavior.
On Wed, Apr 12, 2017 at 8:55 AM Jogi Hofmüller <jogi@xxxxxx> wrote:
Dear all,
we run a small cluster [1] that is exclusively used for virtualisation
(kvm/libvirt). Recently we started to run into performance problems
(slow requests, failing OSDs) for no *obvious* reason (at least not for
us).
We do nightly snapshots of VM images and keep the snapshots for 14
days. Currently we run 8 VMs in the cluster.
At first it looked like the problem was related to snapshotting images
of VMs that were up and running (respectively deleting the snapshots
after 14 days). So we changed the procedure to first suspend the VM and
the snapshot its image(s). Snapshots are made at 4 am.
When we removed *all* the old snapshots (the ones done of running VMs)
the cluster suddenly behaved 'normal' again, but after two days of
creating snapshots (not deleting any) of suspended VMs, the slow
requests started again (although by far not as frequent as before).
This morning we experienced subsequent failures (e.g. osd.2
IPv4:6800/1621 failed (2 reporters from different host after 49.976472
>= grace 46.444312) of 4 of our 6 OSDs, resulting in HEALTH_WARN with
up to about 20% of PGs active+undersized+degraded or stale+active+clean
or remapped+peering. No OSD failure lasted longer than 4 minutes. After
15 minutes everything was back to normal again. The noise started at
6:25 am, a time when cron.daily scripts run here.
We have no clue what could have caused this behavior :( There seems to
be no shortage of resources (CPU, RAM, network) that would explain what
happened, but maybe we did not look in the right places. So any hint on
where to look/what to look for would be greatly appreciated :)
[1] cluster setup
Three nodes: ceph1, ceph2, ceph3
ceph1 and ceph2
1x Intel(R) Xeon(R) CPU E3-1275 v3 @ 3.50GHz
32 GB RAM
RAID1 for OS
1x Intel 530 Series SSDs (120GB) for Journals
3x WDC WD2500BUCT-63TWBY0 for OSDs (1TB)
2x Gbit Ethernet bonded (802.3ad) on HP 2920 Stack
ceph3
virtual machine
1 CPU
4 GB RAM
Software
Debian GNU/Linux Jessie (8.7)
Kernel 3.16
ceph 10.2.6 (656b5b63ed7c43bd014bcafd81b001959d5f089f)
Ceph Services
3 Monitors: ceph1, ceph2, ceph3
6 OSDs: ceph1 (3), ceph2 (3)
Regards,
--
J.Hofmüller
Nisiti
- Abie Nathan, 1927-2008
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com