Hello, Few days ago I had very strange situation. I had to turn off few OSDs for a while. So I've set flags:noout, nobackfill, norecover and then turned off selected OSDs. All was ok, but when I started these OSDs again all VMs went down due to recovery process(even when recovery priority was very low). There's more important config values: "osd_recovery_threads": "1", "osd_recovery_thread_timeout": "30", "osd_recovery_thread_suicide_timeout": "300", "osd_recovery_delay_start": "0", "osd_recovery_max_active": "1", "osd_recovery_max_single_start": "5", "osd_recovery_max_chunk": "8388608", "osd_client_op_priority": "63", "osd_recovery_op_priority": "1", "osd_recovery_op_warn_multiple": "16", "osd_backfill_full_ratio": "0.85", "osd_backfill_retry_interval": "10", "osd_backfill_scan_min": "64", "osd_backfill_scan_max": "512", "osd_kill_backfill_at": "0", "osd_max_backfills": "1", I don't know why ceph started recovery process if there was norecovery&nobackfill flags enabled but the fact is that it killed all VMs. Next, I've turned off noout, nobackfill, norecover flags and it started to look better. VM's went back online and recovery process was still going. I didn't saw performance impact on SSD disks but there was huge impact on spinners. Normally %util is about 25%, but during recovery it was nearly 100%. CPU Load increased on HDD based VMs by ~400%. iostat fragment(during recovery): Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdh 0.30 1.00 150.90 36.00 13665.60 954.60 156.45 10.63 56.88 25.60 188.02 5.34 99.80 Now, I'm little lost, I don't know answers for few questions. 1. Why ceph started recovery even if nobackfill&norecovery option was enabled? 2. Why recovery caused much more performance impact when norecovery&nobackfill options was enabled? 3. Why when norecovery&nobackfill was turned off, cluster started to look better but %util on HDD disks was so big(while recovery_op_priority=1 and client_op_priority=63)? 25% is normal, increased to 100% during recovery? Cluster information: ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90) 3x nodes(CPU E5-2630, 32GB RAM, 6xHDD 2TB with SSD journal, 3x SSD 1TB with NVMe journal), triple replication I would be very grateful If somebody can help me. Sorry if I've done something in wrong way - this is my first time writing on mailing list. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com