Speeding up backfill after increasing PGs and or adding OSDs

<george.vasilakakos@xxxxxxxxxx> · Thu, 6 Jul 2017 14:08:34 +0000

Hey folks,

We have a cluster that's currently backfilling from increasing PG counts. We have tuned recovery and backfill way down as a "precaution" and would like to start tuning it to bring up to a good balance between that and client I/O.

At the moment we're in the process of bumping up PG numbers for pools serving production workloads. Said pools are EC 8+3.

It looks like we're having very low numbers of PGs backfilling as in:

            2567 TB used, 5062 TB / 7630 TB avail
            145588/849529410 objects degraded (0.017%)
            5177689/849529410 objects misplaced (0.609%)
                7309 active+clean
                  23 active+clean+scrubbing
                  18 active+clean+scrubbing+deep
                  13 active+remapped+backfill_wait
                   5 active+undersized+degraded+remapped+backfilling
                   4 active+undersized+degraded+remapped+backfill_wait
                   3 active+remapped+backfilling
                   1 active+clean+inconsistent
recovery io 1966 MB/s, 96 objects/s
  client io 726 MB/s rd, 147 MB/s wr, 89 op/s rd, 71 op/s wr

Also, the rate of recovery in terms of data and object throughput varies a lot, even with the number of PGs backfilling remaining constant.

Here's the config in the OSDs:

    "osd_max_backfills": "1",
    "osd_min_recovery_priority": "0",
    "osd_backfill_full_ratio": "0.85",
    "osd_backfill_retry_interval": "10",
    "osd_allow_recovery_below_min_size": "true",
    "osd_recovery_threads": "1",
    "osd_backfill_scan_min": "16",
    "osd_backfill_scan_max": "64",
    "osd_recovery_thread_timeout": "30",
    "osd_recovery_thread_suicide_timeout": "300",
    "osd_recovery_sleep": "0",
    "osd_recovery_delay_start": "0",
    "osd_recovery_max_active": "5",
    "osd_recovery_max_single_start": "1",
    "osd_recovery_max_chunk": "8388608",
    "osd_recovery_max_omap_entries_per_chunk": "64000",
    "osd_recovery_forget_lost_objects": "false",
    "osd_scrub_during_recovery": "false",
    "osd_kill_backfill_at": "0",
    "osd_debug_skip_full_check_in_backfill_reservation": "false",
    "osd_debug_reject_backfill_probability": "0",
    "osd_recovery_op_priority": "5",
    "osd_recovery_priority": "5",
    "osd_recovery_cost": "20971520",
    "osd_recovery_op_warn_multiple": "16",

What I'm looking for, first of all, is a better understanding of the mechanism that schedules the backfilling/recovery work; the end goal is to understand how to tune this safely to achieve as close to an optimal balance between rate at which recovery and client work is performed.

I'm thinking things like osd_max_backfills, osd_backfill_scan_min/osd_backfill_scan_max might be prime candidates for tuning.

Any thoughs/insights by the Ceph community will be greatly appreciated,

George
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com