Adding New OSD Problem

Ramazan Terzi <ramazanterzi@xxxxxxxxx> · Tue, 25 Apr 2017 23:47:31 +0300

Hello,

I have a Ceph Cluster with specifications below:
3 x Monitor node
6 x Storage Node (6 disk per Storage Node, 6TB SATA Disks, all disks have SSD journals)
Distributed public and private networks. All NICs are 10Gbit/s
osd pool default size = 3
osd pool default min size = 2

Ceph version is Jewel 10.2.6.

Current health status:
    cluster ****************
     health HEALTH_OK
     monmap e9: 3 mons at {ceph-mon01=xxx:6789/0,ceph-mon02=xxx:6789/0,ceph-mon03=xxx:6789/0}
            election epoch 84, quorum 0,1,2 ceph-mon01,ceph-mon02,ceph-mon03
     osdmap e1512: 36 osds: 36 up, 36 in
            flags sortbitwise,require_jewel_osds
      pgmap v7698673: 1408 pgs, 5 pools, 37365 GB data, 9436 kobjects
            83871 GB used, 114 TB / 196 TB avail
                1408 active+clean

My cluster is active and a lot of virtual machines running on it (Linux and Windows VM's, database clusters, web servers etc).

When I want to add a new storage node with 1 disk, I'm getting huge problems. With new osd, crushmap updated and Ceph Cluster turns into recovery mode. Everything is OK. But after a while, some runnings VM's became unmanageable. Servers become unresponsive one by one. Recovery process would take an average of 20 hours. For this reason, I removed the new osd. Recovery process completed and everythink become normal.

When new osd added, health status:
    cluster ****************
     health HEALTH_WARN
                91 pgs backfill_wait
                1 pgs bacfilling
                28 pgs degraded
                28 pgs recovery_wait
                28 phs stuck degraded
                recovery 2195/18486602 objects degraded (0.012%)
                recovery 1279784/18486602 objects misplaced (6.923%)
     monmap e9: 3 mons at {ceph-mon01=xxx:6789/0,ceph-mon02=xxx:6789/0,ceph-mon03=xxx:6789/0}
            election epoch 84, quorum 0,1,2 ceph-mon01,ceph-mon02,ceph-mon03
     osdmap e1512: 37 osds: 37 up, 37 in
            flags sortbitwise,require_jewel_osds
      pgmap v7698673: 1408 pgs, 5 pools, 37365 GB data, 9436 kobjects
            83871 GB used, 114 TB / 201 TB avail
            2195/18486602 objects degraded (0.012%)
            1279784/18486602 objects misplaced (6.923%)
                1286 active+clean
                    91 active+remapped+wait_backfill
                   28 active+recovery_wait+degraded
                     2 active+clean+scrubbing+deep
                     1 active+remapped+backfilling
recovery io 430 MB/s, 119 objects/s
     client io 36174 B/s rrd, 5567 kB/s wr, 5 op/s rd, 700 op/s wr

Some Ceph config parameters:
osd_max_backfills = 1
osd_backfill_full_ratio = 0.85
osd_recovery_max_active = 3
osd_recovery_threads = 1

How I can add new OSD's safely?

Best regards,
Ramazan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com