Re: slow requests and short OSD failures in small cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I can almost guarantee what you're seeing is PG subfolder splitting.  When the subfolders in a PG get X number of objects, it splits into 16 subfolders.  Every cluster I manage has blocked requests and OSDs that get marked down while this is happening.  To stop the OSDs getting marked down, I increase the osd_heartbeat_grace until the OSDs no longer mark themselves down during this process.  Based on your email, it looks like starting at 5 minutes would be a good place.  The blocked requests will still persist, but the OSDs aren't being marked down regularly and adding peering to the headache.

In 10.2.5 and 0.94.9, there was a way to take an OSD offline and tell it to split the subfolders of its PGs.  I haven't done this yet, myself, but plan to figure it out the next time I come across this sort of behavior.

On Wed, Apr 12, 2017 at 8:55 AM Jogi Hofmüller <jogi@xxxxxx> wrote:
Dear all,

we run a small cluster [1] that is exclusively used for virtualisation
(kvm/libvirt). Recently we started to run into performance problems
(slow requests, failing OSDs) for no *obvious* reason (at least not for
us).

We do nightly snapshots of VM images and keep the snapshots for 14
days. Currently we run 8 VMs in the cluster.

At first it looked like the problem was related to snapshotting images
of VMs that were up and running (respectively deleting the snapshots
after 14 days). So we changed the procedure to first suspend the VM and
the snapshot its image(s). Snapshots are made at 4 am.

When we removed *all* the old snapshots (the ones done of running VMs)
the cluster suddenly behaved 'normal' again, but after two days of
creating snapshots (not deleting any) of suspended VMs, the slow
requests started again (although by far not as frequent as before).

This morning we experienced subsequent failures (e.g. osd.2
IPv4:6800/1621 failed (2 reporters from different host after 49.976472
>= grace 46.444312) of 4 of our 6 OSDs, resulting in HEALTH_WARN with
up to about 20% of PGs active+undersized+degraded or stale+active+clean
or remapped+peering. No OSD failure lasted longer than 4 minutes. After
15 minutes everything was back to normal again. The noise started at
6:25 am, a time when cron.daily scripts run here.

We have no clue what could have caused this behavior :( There seems to
be no shortage of resources (CPU, RAM, network) that would explain what
happened, but maybe we did not look in the right places. So any hint on
where to look/what to look for would be greatly appreciated :)

[1]  cluster setup

Three nodes: ceph1, ceph2, ceph3

ceph1 and ceph2

    1x Intel(R) Xeon(R) CPU E3-1275 v3 @ 3.50GHz
    32 GB RAM
    RAID1 for OS
    1x Intel 530 Series SSDs (120GB) for Journals
    3x WDC WD2500BUCT-63TWBY0 for OSDs (1TB)
    2x Gbit Ethernet bonded (802.3ad) on HP 2920 Stack 

ceph3

    virtual machine
    1 CPU
    4 GB RAM 

Software

    Debian GNU/Linux Jessie (8.7)
    Kernel 3.16
    ceph 10.2.6 (656b5b63ed7c43bd014bcafd81b001959d5f089f) 

Ceph Services

3 Monitors: ceph1, ceph2, ceph3

6 OSDs: ceph1 (3), ceph2 (3) 

Regards,
--
J.Hofmüller

           Nisiti
           - Abie Nathan, 1927-2008

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux