Hello, I have a Ceph Cluster with specifications below: 3 x Monitor node 6 x Storage Node (6 disk per Storage Node, 6TB SATA Disks, all disks have SSD journals) Distributed public and private networks. All NICs are 10Gbit/s osd pool default size = 3 osd pool default min size = 2 Ceph version is Jewel 10.2.6. Current health status: cluster **************** health HEALTH_OK monmap e9: 3 mons at {ceph-mon01=xxx:6789/0,ceph-mon02=xxx:6789/0,ceph-mon03=xxx:6789/0} election epoch 84, quorum 0,1,2 ceph-mon01,ceph-mon02,ceph-mon03 osdmap e1512: 36 osds: 36 up, 36 in flags sortbitwise,require_jewel_osds pgmap v7698673: 1408 pgs, 5 pools, 37365 GB data, 9436 kobjects 83871 GB used, 114 TB / 196 TB avail 1408 active+clean My cluster is active and a lot of virtual machines running on it (Linux and Windows VM's, database clusters, web servers etc). When I want to add a new storage node with 1 disk, I'm getting huge problems. With new osd, crushmap updated and Ceph Cluster turns into recovery mode. Everything is OK. But after a while, some runnings VM's became unmanageable. Servers become unresponsive one by one. Recovery process would take an average of 20 hours. For this reason, I removed the new osd. Recovery process completed and everythink become normal. When new osd added, health status: cluster **************** health HEALTH_WARN 91 pgs backfill_wait 1 pgs bacfilling 28 pgs degraded 28 pgs recovery_wait 28 phs stuck degraded recovery 2195/18486602 objects degraded (0.012%) recovery 1279784/18486602 objects misplaced (6.923%) monmap e9: 3 mons at {ceph-mon01=xxx:6789/0,ceph-mon02=xxx:6789/0,ceph-mon03=xxx:6789/0} election epoch 84, quorum 0,1,2 ceph-mon01,ceph-mon02,ceph-mon03 osdmap e1512: 37 osds: 37 up, 37 in flags sortbitwise,require_jewel_osds pgmap v7698673: 1408 pgs, 5 pools, 37365 GB data, 9436 kobjects 83871 GB used, 114 TB / 201 TB avail 2195/18486602 objects degraded (0.012%) 1279784/18486602 objects misplaced (6.923%) 1286 active+clean 91 active+remapped+wait_backfill 28 active+recovery_wait+degraded 2 active+clean+scrubbing+deep 1 active+remapped+backfilling recovery io 430 MB/s, 119 objects/s client io 36174 B/s rrd, 5567 kB/s wr, 5 op/s rd, 700 op/s wr Some Ceph config parameters: osd_max_backfills = 1 osd_backfill_full_ratio = 0.85 osd_recovery_max_active = 3 osd_recovery_threads = 1 How I can add new OSD's safely? Best regards, Ramazan _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com