Ceph recovery kill VM's even with the smallest priority

Damian Dabrowski <scooty96@xxxxxxxxx> · Thu, 29 Mar 2018 16:27:04 +0200

Hello,

Few days ago I had very strange situation.

I had to turn off few OSDs for a while. So I've set flags:noout,
nobackfill, norecover and then turned off selected OSDs.
All was ok, but when I started these OSDs again all VMs went down due
to recovery process(even when recovery priority was very low).

There's more important config values:
    "osd_recovery_threads": "1",
    "osd_recovery_thread_timeout": "30",
    "osd_recovery_thread_suicide_timeout": "300",
    "osd_recovery_delay_start": "0",
    "osd_recovery_max_active": "1",
    "osd_recovery_max_single_start": "5",
    "osd_recovery_max_chunk": "8388608",
    "osd_client_op_priority": "63",
    "osd_recovery_op_priority": "1",
    "osd_recovery_op_warn_multiple": "16",
    "osd_backfill_full_ratio": "0.85",
    "osd_backfill_retry_interval": "10",
    "osd_backfill_scan_min": "64",
    "osd_backfill_scan_max": "512",
    "osd_kill_backfill_at": "0",
    "osd_max_backfills": "1",

I don't know why ceph started recovery process if there was
norecovery&nobackfill flags enabled but the fact is that it killed all
VMs.

Next, I've turned off noout, nobackfill, norecover flags and it
started to look better. VM's went back online and recovery process was
still going. I didn't saw performance impact on SSD disks but there
was huge impact on spinners.
Normally %util is about 25%, but during recovery it was nearly 100%.
CPU Load increased on HDD based VMs by ~400%.

iostat fragment(during recovery):
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdh              0.30     1.00  150.90   36.00 13665.60   954.60
156.45    10.63   56.88   25.60  188.02   5.34  99.80

Now, I'm little lost, I don't know answers for few questions.
1. Why ceph started recovery even if nobackfill&norecovery option was enabled?
2. Why recovery caused much more performance impact when
norecovery&nobackfill options was enabled?
3. Why when norecovery&nobackfill was turned off, cluster started to
look better but %util on HDD disks was so big(while
recovery_op_priority=1 and client_op_priority=63)? 25% is normal,
increased to 100% during recovery?

Cluster information:
ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90)
3x nodes(CPU E5-2630, 32GB RAM, 6xHDD 2TB with SSD journal, 3x SSD 1TB
with NVMe journal), triple replication

I would be very grateful If somebody can help me.
Sorry if I've done something in wrong way - this is my first time
writing on mailing list.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com