Re: Slow backfilling with bluestore, ssd and metadatapools

Burkhard Linke <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> · Thu, 21 Dec 2017 12:27:06 +0100

Hi,

On 12/21/2017 11:43 AM, Richard Hesketh wrote:
On 21/12/17 10:28, Burkhard Linke wrote:
OSD config section from ceph.conf:

[osd]
osd_scrub_sleep = 0.05
osd_journal_size = 10240
osd_scrub_chunk_min = 1
osd_scrub_chunk_max = 1
max_pg_per_osd_hard_ratio = 4.0
osd_max_pg_per_osd_hard_ratio = 4.0
bluestore_cache_size_hdd = 5368709120
mon_max_pg_per_osd = 400
Consider also playing with the following OSD parameters:

osd_recovery_max_active
osd_recovery_sleep
osd_recovery_sleep_hdd
osd_recovery_sleep_hybrid
osd_recovery_sleep_ssd

In my anecdotal experience, the forced wait between requests (controlled by the recovery_sleep parameters) was causing significant slowdown in recovery speed in my cluster, though even at the default values it wasn't making things go nearly as slowly as your cluster - it sounds like something else is probably wrong.

Thanks for the hint. I've been thinking about recovery_sleep, too. But 
the default for ssd osds is set to 0.0:

# ceph daemon osd.93 config show | grep recovery
    "osd_allow_recovery_below_min_size": "true",
    "osd_debug_skip_full_check_in_recovery": "false",
    "osd_force_recovery_pg_log_entries_factor": "1.300000",
    "osd_min_recovery_priority": "0",
    "osd_recovery_cost": "20971520",
    "osd_recovery_delay_start": "0.000000",
    "osd_recovery_forget_lost_objects": "false",
    "osd_recovery_max_active": "3",
    "osd_recovery_max_chunk": "8388608",
    "osd_recovery_max_omap_entries_per_chunk": "64000",
    "osd_recovery_max_single_start": "1",
    "osd_recovery_op_priority": "3",
    "osd_recovery_op_warn_multiple": "16",
    "osd_recovery_priority": "5",
    "osd_recovery_retry_interval": "30.000000",
    "osd_recovery_sleep": "0.000000",
    "osd_recovery_sleep_hdd": "0.100000",
    "osd_recovery_sleep_hybrid": "0.025000",
    "osd_recovery_sleep_ssd": "0.000000",
    "osd_recovery_thread_suicide_timeout": "300",
    "osd_recovery_thread_timeout": "30",
    "osd_scrub_during_recovery": "false",

osd 93 is one of the ssd osd I've just recreated using bluestore about 3 
hours ago. All recovery related values are at their defaults. Since the 
first mail one hour ago the PG made some progress:

8.101      7580                  0        0      2777 0           0 
2496     2496 active+remapped+backfilling 2017-12-21 09:03:30.429605 
543455'1013006    543518:1927782 [78,34,49]         
78                     [78,34,19] 78    522371'1009118 2017-12-18 
16:11:29.755231    522371'1009118 2017-12-18 16:11:29.755231

So roughly 2000 objects on this PG have been copied to a new ssd based 
OSD (78,34,19 -> 78,34,49 -> one new copy).

Regards,
Burkhard
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com