pgs inactive after setting a new crush rule (Re: backfill_toofull after adding new OSDs)

Jan Kasprzak <kas@xxxxxxxxxx> · Thu, 31 Jan 2019 18:16:05 +0100

Jan Kasprzak wrote:
: 	OKay, now I changed the crush rule also on a pool with
: the real data, and it seems all the client i/o on that pool has stopped.
: The recovery continues, but things like qemu I/O, "rbd ls", and so on
: are just stuck doing nothing.
: 
: 	Can I unstuck it somehow (faster than waiting for all the recovery
: to finish)? Thanks.

	I was able to briefly reduce the "1721 pgs inactive" number
by restarting the some of the original filestore OSDs, but after some time
the number increased back to 1721. Then the data recovery finished,
and 1721 PGs remained inactive (and, of course this pool I/O was stuck,
both qemu and "rbd ls").

	So I have returned the original crush rule, the data started
to migrate back to the original OSDs, and the client I/O got unstuck
(even though the data relocation is still in progress).

	Where can be the problem? It might be that I am hitting the limit
of number of PGs per OSD or something? I had 60 OSDs before, and want
to move it all to 20 new OSDs instead. The pool in question has 2048 PGs.

	Thanks,

-Yenya
: 
: # ceph -s
:   cluster:
:     id:     ... my-uuid ...
:     health: HEALTH_ERR
:             3308311/3803892 objects misplaced (86.972%)
:             Reduced data availability: 1721 pgs inactive
:             Degraded data redundancy: 85361/3803892 objects degraded (2.244%), 1
: 39 pgs degraded, 139 pgs undersized
:             Degraded data redundancy (low space): 25 pgs backfill_toofull
: 
:   services:
:     mon: 3 daemons, quorum mon1,mon2,mon3
:     mgr: mon2(active), standbys: mon1, mon3
:     osd: 80 osds: 80 up, 80 in; 1868 remapped pgs
:     rgw: 1 daemon active
: 
:   data:
:     pools:   13 pools, 5056 pgs
:     objects: 1.27 M objects, 4.8 TiB
:     usage:   15 TiB used, 208 TiB / 224 TiB avail
:     pgs:     34.039% pgs not active
:              85361/3803892 objects degraded (2.244%)
:              3308311/3803892 objects misplaced (86.972%)
:              3188 active+clean
:              1582 activating+remapped
:              139  activating+undersized+degraded+remapped
:              93   active+remapped+backfill_wait
:              29   active+remapped+backfilling
:              25   active+remapped+backfill_wait+backfill_toofull
: 
:   io:
:     recovery: 174 MiB/s, 43 objects/s
: 
: 
: -Yenya
: 
: 
: Jan Kasprzak wrote:
: : : ----- Original Message -----
: : : From: "Caspar Smit" <casparsmit@xxxxxxxxxxx>
: : : To: "Jan Kasprzak" <kas@xxxxxxxxxx>
: : : Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
: : : Sent: Thursday, 31 January, 2019 15:43:07
: : : Subject: Re:  backfill_toofull after adding new OSDs
: : : 
: : : Hi Jan, 
: : : 
: : : You might be hitting the same issue as Wido here: 
: : : 
: : : [ https://www.spinics.net/lists/ceph-users/msg50603.html | https://www.spinics.net/lists/ceph-users/msg50603.html ] 
: : : 
: : : Kind regards, 
: : : Caspar 
: : : 
: : : Op do 31 jan. 2019 om 14:36 schreef Jan Kasprzak < [ mailto:kas@xxxxxxxxxx | kas@xxxxxxxxxx ] >: 
: : : 
: : : 
: : : Hello, ceph users, 
: : : 
: : : I see the following HEALTH_ERR during cluster rebalance: 
: : : 
: : : Degraded data redundancy (low space): 8 pgs backfill_toofull 
: : : 
: : : Detailed description: 
: : : I have upgraded my cluster to mimic and added 16 new bluestore OSDs 
: : : on 4 hosts. The hosts are in a separate region in my crush map, and crush 
: : : rules prevented data to be moved on the new OSDs. Now I want to move 
: : : all data to the new OSDs (and possibly decomission the old filestore OSDs). 
: : : I have created the following rule: 
: : : 
: : : # ceph osd crush rule create-replicated on-newhosts newhostsroot host 
: : : 
: : : after this, I am slowly moving the pools one-by-one to this new rule: 
: : : 
: : : # ceph osd pool set test-hdd-pool crush_rule on-newhosts 
: : : 
: : : When I do this, I get the above error. This is misleading, because 
: : : ceph osd df does not suggest the OSDs are getting full (the most full 
: : : OSD is about 41 % full). After rebalancing is done, the HEALTH_ERR 
: : : disappears. Why am I getting this error? 
: : : 
: : : # ceph -s 
: : : cluster: 
: : : id: ...my UUID... 
: : : health: HEALTH_ERR 
: : : 1271/3803223 objects misplaced (0.033%) 
: : : Degraded data redundancy: 40124/3803223 objects degraded (1.055%), 65 pgs degraded, 67 pgs undersized 
: : : Degraded data redundancy (low space): 8 pgs backfill_toofull 
: : : 
: : : services: 
: : : mon: 3 daemons, quorum mon1,mon2,mon3 
: : : mgr: mon2(active), standbys: mon1, mon3 
: : : osd: 80 osds: 80 up, 80 in; 90 remapped pgs 
: : : rgw: 1 daemon active 
: : : 
: : : data: 
: : : pools: 13 pools, 5056 pgs 
: : : objects: 1.27 M objects, 4.8 TiB 
: : : usage: 15 TiB used, 208 TiB / 224 TiB avail 
: : : pgs: 40124/3803223 objects degraded (1.055%) 
: : : 1271/3803223 objects misplaced (0.033%) 
: : : 4963 active+clean 
: : : 41 active+recovery_wait+undersized+degraded+remapped 
: : : 21 active+recovery_wait+undersized+degraded 
: : : 17 active+remapped+backfill_wait 
: : : 5 active+remapped+backfill_wait+backfill_toofull 
: : : 3 active+remapped+backfill_toofull 
: : : 2 active+recovering+undersized+remapped 
: : : 2 active+recovering+undersized+degraded+remapped 
: : : 1 active+clean+remapped 
: : : 1 active+recovering+undersized+degraded 
: : : 
: : : io: 
: : : client: 6.6 MiB/s rd, 2.7 MiB/s wr, 75 op/s rd, 89 op/s wr 
: : : recovery: 2.0 MiB/s, 92 objects/s 
: : : 
: : : Thanks for any hint, 
: : : 
: : : -Yenya 
: : : 
: : : -- 
: : : | Jan "Yenya" Kasprzak <kas at { [ http://fi.muni.cz/ | fi.muni.cz ] - work | [ http://yenya.net/ | yenya.net ] - private}> | 
: : : | [ http://www.fi.muni.cz/~kas/ | http://www.fi.muni.cz/~kas/ ] GPG: 4096R/A45477D5 | 
: : : This is the world we live in: the way to deal with computers is to google 
: : : the symptoms, and hope that you don't have to watch a video. --P. Zaitcev 
: : : _______________________________________________ 
: : : ceph-users mailing list 
: : : [ mailto:ceph-users@xxxxxxxxxxxxxx | ceph-users@xxxxxxxxxxxxxx ] 
: : : [ http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com | http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ] 
: : : 
: : : _______________________________________________
: : : ceph-users mailing list
: : : ceph-users@xxxxxxxxxxxxxx
: : : http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
: : 
: : -- 
: : | Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
: : | http://www.fi.muni.cz/~kas/                         GPG: 4096R/A45477D5 |
: :  This is the world we live in: the way to deal with computers is to google
: :  the symptoms, and hope that you don't have to watch a video. --P. Zaitcev
: : _______________________________________________
: : ceph-users mailing list
: : ceph-users@xxxxxxxxxxxxxx
: : http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
: 
: -- 
: | Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
: | http://www.fi.muni.cz/~kas/                         GPG: 4096R/A45477D5 |
:  This is the world we live in: the way to deal with computers is to google
:  the symptoms, and hope that you don't have to watch a video. --P. Zaitcev
: _______________________________________________
: ceph-users mailing list
: ceph-users@xxxxxxxxxxxxxx
: http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| http://www.fi.muni.cz/~kas/                         GPG: 4096R/A45477D5 |
 This is the world we live in: the way to deal with computers is to google
 the symptoms, and hope that you don't have to watch a video. --P. Zaitcev
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com