Re: osd backfills and recovery limit issue

cgxu <cgxu@xxxxxxxxxxxx> · Thu, 10 Aug 2017 17:12:08 +0800

The explain about osd_max_backfills is below.

osd max backfills
Description: The maximum number of backfills allowed to or from a single OSD.
Type: 64-bit Unsigned Integer
Default: 1

So, I just think the option does not limit osd numbers in backfill activity.

在 2017年8月10日，下午1:58，Hyun Ha <hfamily15@xxxxxxxxx> 写道：

Thank you for comment.

I can understand what you mean.
When one osd goes down, the osd has many PGs through whole ceph cluster nodes, so each nodes can have one backfill/recovery per osd and ceph culster shows many backfills/recoverys.
The other side, When one osd goes up, the osd needs to copy PG one by one from other nodes, so ceph cluster shows 1 backfill/recovery.
Is that right?

When host or osd goes down, it can give more performance impact than when host or osd goes up.
So, Is there any configuration to limit osd count per PG when ceph is doing recovers/backfills? 
Or Is it possible when the usage of system resource(cpu, memory, network throughput, etc) is low, force more recovery/backfills like recovery scheduling?

Thank you.

2017-08-10 13:31 GMT+09:00 David Turner <drakonstein@xxxxxxxxx>:
osd_max_backfills is a setting per osd.  With that set to 1, each osd will only be involved in a single backfill/recovery at the same time.  However the cluster as a whole will have as many backfills as it can while each osd is only involved in 1 each.

On Wed, Aug 9, 2017 at 10:58 PM 하현 <hfamily15@xxxxxxxxx> wrote:
Hi ceph experts.
I confused when set limitation of osd max backfills.
When osd down recovery&backfills occuerred, and osd up is same.

I want to set limitation for backfills to 1.
So, I set config as below.

# ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show|egrep "osd_max_backfills|osd_recovery_threads|osd_recovery_max_active|osd_recovery_op_priority"
    "osd_max_backfills": "1",
    "osd_recovery_threads": "1",
    "osd_recovery_max_active": "1",
    "osd_recovery_op_priority": "3",

When osd up it seemed works good but when osd down it seemed not works as I thinks.
Please see the ceph watch logs.

osd down>
pgmap v898158: 2048 pgs: 20 remapped+peering, 106 active+undersized+degraded, 1922 active+clean; 641 B/s rd, 253 kB/s wr, 36 op/s; 45807/1807242 objects degraded (2.535%)	
pgmap v898159: 2048 pgs: 5 active+undersized+degraded+remapped+backfilling, 9 activating+undersized+degraded+remapped, 24 active+undersized+degraded+remapped+wait_backfill, 20 remapped+peering, 68 active+undersized+degraded, 1922 active+clean; 510 B/s rd, 498 kB/s wr, 42 op/s; 41619/1812733 objects degraded (2.296%); 21029/1812733 objects misplaced (1.160%); 149 MB/s, 37 objects/s recovering	
pgmap v898168: 2048 pgs: 16 active+undersized+degraded+remapped+backfilling, 110 active+undersized+degraded+remapped+wait_backfill, 1922 active+clean; 508 B/s rd, 562 kB/s wr, 61 op/s; 54118/1823939 objects degraded (2.967%); 86984/1823939 objects misplaced (4.769%); 4025 MB/s, 1006 objects/s recovering	
pgmap v898192: 2048 pgs: 3 peering, 1 activating, 13 active+undersized+degraded+remapped+backfilling, 106 active+undersized+degraded+remapped+wait_backfill, 1925 active+clean; 10184 B/s rd, 362 kB/s wr, 47 op/s; 49724/1823312 objects degraded (2.727%); 79709/1823312 objects misplaced (4.372%); 1949 MB/s, 487 objects/s recovering	
pgmap v898216: 2048 pgs: 1 active+undersized+remapped, 11 active+undersized+degraded+remapped+backfilling, 98 active+undersized+degraded+remapped+wait_backfill, 1938 active+clean; 10164 B/s rd, 251 kB/s wr, 37 op/s; 44429/1823312 objects degraded (2.437%); 74037/1823312 objects misplaced (4.061%); 2751 MB/s, 687 objects/s recovering	
pgmap v898541: 2048 pgs: 1 active+undersized+degraded+remapped+backfilling, 2047 active+clean; 218 kB/s wr, 39 op/s; 261/1806097 objects degraded (0.014%); 543/1806097 objects misplaced (0.030%); 677 MB/s, 9 keys/s, 176 objects/s recovering	

osd up>
pgmap v899274: 2048 pgs: 2 activating, 14 peering, 12 remapped+peering, 2020 active+clean; 5594 B/s rd, 452 kB/s wr, 54 op/s
pgmap v899277: 2048 pgs: 1 active+remapped+backfilling, 41 active+remapped+wait_backfill, 2 activating, 14 peering, 1990 active+clean; 595 kB/s wr, 23 op/s; 36111/1823939 objects misplaced (1.980%); 380 MB/s, 95 objects/s recovering
pgmap v899298: 2048 pgs: 1 peering, 1 active+remapped+backfilling, 40 active+remapped+wait_backfill, 2006 active+clean; 723 kB/s wr, 13 op/s; 34903/1823294 objects misplaced (1.914%); 1113 MB/s, 278 objects/s recovering
pgmap v899342: 2048 pgs: 1 active+remapped+backfilling, 39 active+remapped+wait_backfill, 2008 active+clean; 5615 B/s rd, 291 kB/s wr, 41 op/s; 33150/1822666 objects misplaced (1.819%)
pgmap v899274: 2048 pgs: 2 activating, 14 peering, 12 remapped+peering, 2020 active+clean;5594 B/s rd, 452 kB/s wr, 54 op/s
pgmap v899796: 2048 pgs: 1 activating, 1 active+remapped+backfilling, 10 active+remapped+wait_backfill, 2036 active+clean; 235 kB/s wr, 22 op/s; 6423/1809085 objects misplaced (0.355%)

in osd down> logs,  we can see 16 backfills, and in osd up> logs, we can see only one backfills. Is that correct? If not, what config should I set ?
Thank you in advance.
_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com