Re: slow requests and short OSD failures in small cluster

David Turner <drakonstein@xxxxxxxxx> · Wed, 12 Apr 2017 13:46:33 +0000

I can almost guarantee what you're seeing is PG subfolder splitting.  When the subfolders in a PG get X number of objects, it splits into 16 subfolders.  Every cluster I manage has blocked requests and OSDs that get marked down while this is happening.  To stop the OSDs getting marked down, I increase the osd_heartbeat_grace until the OSDs no longer mark themselves down during this process.  Based on your email, it looks like starting at 5 minutes would be a good place.  The blocked requests will still persist, but the OSDs aren't being marked down regularly and adding peering to the headache.
In 10.2.5 and 0.94.9, there was a way to take an OSD offline and tell it to split the subfolders of its PGs.  I haven't done this yet, myself, but plan to figure it out the next time I come across this sort of behavior.

On Wed, Apr 12, 2017 at 8:55 AM Jogi Hofmüller <jogi@xxxxxx> wrote:
Dear all,

we run a small cluster [1] that is exclusively used for virtualisation

(kvm/libvirt). Recently we started to run into performance problems

(slow requests, failing OSDs) for no *obvious* reason (at least not for

us).

We do nightly snapshots of VM images and keep the snapshots for 14

days. Currently we run 8 VMs in the cluster.

At first it looked like the problem was related to snapshotting images

of VMs that were up and running (respectively deleting the snapshots

after 14 days). So we changed the procedure to first suspend the VM and

the snapshot its image(s). Snapshots are made at 4 am.

When we removed *all* the old snapshots (the ones done of running VMs)

the cluster suddenly behaved 'normal' again, but after two days of

creating snapshots (not deleting any) of suspended VMs, the slow

requests started again (although by far not as frequent as before).

This morning we experienced subsequent failures (e.g. osd.2

IPv4:6800/1621 failed (2 reporters from different host after 49.976472

>= grace 46.444312) of 4 of our 6 OSDs, resulting in HEALTH_WARN with

up to about 20% of PGs active+undersized+degraded or stale+active+clean

or remapped+peering. No OSD failure lasted longer than 4 minutes. After

15 minutes everything was back to normal again. The noise started at

6:25 am, a time when cron.daily scripts run here.

We have no clue what could have caused this behavior :( There seems to

be no shortage of resources (CPU, RAM, network) that would explain what

happened, but maybe we did not look in the right places. So any hint on

where to look/what to look for would be greatly appreciated :)

[1]  cluster setup

Three nodes: ceph1, ceph2, ceph3

ceph1 and ceph2

    1x Intel(R) Xeon(R) CPU E3-1275 v3 @ 3.50GHz

    32 GB RAM

    RAID1 for OS

    1x Intel 530 Series SSDs (120GB) for Journals

    3x WDC WD2500BUCT-63TWBY0 for OSDs (1TB)

    2x Gbit Ethernet bonded (802.3ad) on HP 2920 Stack 

ceph3

    virtual machine

    1 CPU

    4 GB RAM 

Software

    Debian GNU/Linux Jessie (8.7)

    Kernel 3.16

    ceph 10.2.6 (656b5b63ed7c43bd014bcafd81b001959d5f089f) 

Ceph Services

3 Monitors: ceph1, ceph2, ceph3

6 OSDs: ceph1 (3), ceph2 (3) 

Regards,

--

J.Hofmüller

           Nisiti

           - Abie Nathan, 1927-2008

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com