When an OSD restarts all of its PGs that had any data modified need to recover when it comes back up. This will make sure that all new objects created
and existing objects that were modified while it was down get replicated. Both of those types of objects count as undersized objects.
Recovery and Backfilling both draw from the osd_max_backfills setting. So if that setting is 2, then an OSD can only be backfilling and/or recovering 2 PGs at once. The PGs needing recovery usually only take a few moments to finish, but they have to wait for the PGs that are backfilling to finish before they can do their quick little tasks and clean up the undersized objects. So if your cluster was healthy and you restarted OSDs, the PGs would recover in a very short time and you'd be back to health_ok very quickly.
Does that answer your question?
Recovery and Backfilling both draw from the osd_max_backfills setting. So if that setting is 2, then an OSD can only be backfilling and/or recovering 2 PGs at once. The PGs needing recovery usually only take a few moments to finish, but they have to wait for the PGs that are backfilling to finish before they can do their quick little tasks and clean up the undersized objects. So if your cluster was healthy and you restarted OSDs, the PGs would recover in a very short time and you'd be back to health_ok very quickly.
Does that answer your question?
David Turner |
Cloud Operations Engineer |
StorageCraft
Technology Corporation 380 Data Drive Suite 300 | Draper | Utah | 84020 Office: 801.871.2760 | Mobile: 385.224.2943 |
If you are not the intended recipient of this message or received it erroneously, please notify the sender and delete it, together with any attachments, and be advised that any dissemination or copying of this message is prohibited. |
________________________________________
From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Nick Fisk [nick@xxxxxxxxxx]
Sent: Monday, December 05, 2016 9:38 AM
To: 'ceph-users'
Subject: PG's become undersize+degraded if OSD's restart during backfill
Hi,
I had recently re-added some old OSD's by zapping them and reintroducing them into cluster as new OSD's. I'm using Ansible to add
the OSD's and because there was an outstanding config change, it restarted all OSD's on the host where I was adding the OSD's at the
end of the play.
I noticed something a bit strange when this happened. First here is the ceph.log just before the restart happened
2016-12-05 15:53:49.234039 mon.0 10.1.2.71:6789/0 5394558 : cluster [INF] pgmap v11064938: 4352 pgs: 1 active+remapped+backfilling,
9 active+clean+scrubbing, 437 active+remapped+wait_backfill, 3 active+clean+scrubbing+deep, 3902 active+clean; 70290 GB data, 207 TB
used, 114 TB / 321 TB avail; 1796 kB/s rd, 449 B/s wr, 0 op/s; 112/58482219 objects degraded (0.000%); 5751145/58482219 objects
misplaced (9.834%); 150 MB/s, 37 objects/s recovering
2016-12-05 15:53:49.436154 mon.0 10.1.2.71:6789/0 5394559 : cluster [INF] osd.50 marked itself down
2016-12-05 15:53:49.436334 mon.0 10.1.2.71:6789/0 5394560 : cluster [INF] osd.59 marked itself down
2016-12-05 15:53:49.436555 mon.0 10.1.2.71:6789/0 5394561 : cluster [INF] osd.56 marked itself down
2016-12-05 15:53:49.437382 mon.0 10.1.2.71:6789/0 5394562 : cluster [INF] osd.54 marked itself down
2016-12-05 15:53:49.437560 mon.0 10.1.2.71:6789/0 5394563 : cluster [INF] osd.57 marked itself down
2016-12-05 15:53:49.437650 mon.0 10.1.2.71:6789/0 5394564 : cluster [INF] osd.52 marked itself down
2016-12-05 15:53:49.438224 mon.0 10.1.2.71:6789/0 5394565 : cluster [INF] osd.58 marked itself down
2016-12-05 15:53:49.438599 mon.0 10.1.2.71:6789/0 5394566 : cluster [INF] osd.49 marked itself down
2016-12-05 15:53:49.438717 mon.0 10.1.2.71:6789/0 5394567 : cluster [INF] osd.55 marked itself down
2016-12-05 15:53:49.439303 mon.0 10.1.2.71:6789/0 5394568 : cluster [INF] osd.48 marked itself down
2016-12-05 15:53:49.439399 mon.0 10.1.2.71:6789/0 5394569 : cluster [INF] osd.51 marked itself down
2016-12-05 15:53:49.439710 mon.0 10.1.2.71:6789/0 5394570 : cluster [INF] osd.53 marked itself down
2016-12-05 15:53:49.966611 mon.0 10.1.2.71:6789/0 5394571 : cluster [INF] osdmap e2089614: 60 osds: 47 up, 59 in
You can see there was only a handful of degraded objects and no undersized ones.
During the period between the down and up of OSD's, lots of messages like these:
2016-12-05 15:53:57.924148 osd.15 10.1.111.2:6800/4024 2734 : cluster [INF] 1.c58 continuing backfill to osd.49 from
(2060554'272739,2089547'275741] MIN to 208954
7'275741
2016-12-05 15:53:58.569751 osd.5 10.1.111.1:6866/1787721 4143 : cluster [INF] 1.bc6 continuing backfill to osd.10 from
(2049483'242150,2089562'245156] MIN to 2089
562'245156
2016-12-05 15:53:58.569760 osd.5 10.1.111.1:6866/1787721 4144 : cluster [INF] 1.bc6 continuing backfill to osd.28 from
(2049483'242150,2089562'245156] MIN to 2089
562'245156
2016-12-05 15:53:58.569827 osd.5 10.1.111.1:6866/1787721 4145 : cluster [INF] 1.bc6 continuing backfill to osd.50 from
(2049490'242155,2089562'245156] MIN to 2089
562'245156
2016-12-05 15:53:58.569883 osd.5 10.1.111.1:6866/1787721 4146 : cluster [INF] 1.add continuing backfill to osd.19 from
(2064881'215318,2089235'218318] MIN to 2089
235'218318
2016-12-05 15:53:58.569933 osd.5 10.1.111.1:6866/1787721 4147 : cluster [INF] 1.add continuing backfill to osd.40 from
(2064881'215318,2089235'218318] MIN to 2089
235'218318
2016-12-05 15:53:58.570026 osd.5 10.1.111.1:6866/1787721 4148 : cluster [INF] 1.add continuing backfill to osd.50 from
(2064881'215318,2089235'218318] MIN to 2089
235'218318
2016-12-05 15:53:58.570816 osd.5 10.1.111.1:6866/1787721 4149 : cluster [INF] 1.cc9 continuing backfill to osd.50 from
(2067348'326147,2089547'329149] MIN to 2089
547'329149
After the restart:
2016-12-05 15:54:30.983780 mon.0 10.1.2.71:6789/0 5394697 : cluster [INF] pgmap v11064986: 4352 pgs: 1
active+undersized+degraded+remapped+backfilling, 5 active+degraded, 142 active+recovery_wait+degraded, 1 remapped+peering, 191
active+undersized+degraded+remapped+wait_backfill, 11 active+clean+scrubbing, 246 active+remapped+wait_backfill, 3755 active+clean;
70290 GB data, 207 TB used, 114 TB / 321 TB avail; 74496 kB/s rd, 179 kB/s wr, 57 op/s; 819165/57663296 objects degraded (1.421%);
4930596/57663296 objects misplaced (8.551%); 5950 kB/s, 6 objects/s recovering
I have lots of undersized PG's and quite a lot of degraded objects. However, I do notice that the degraded objects number will not
move and then suddenly jump down by several thousand. Almost like as the PG is completed backfilling, all the objects are marked as
healthy again.
And an example PG dump
1.e8c 4554 0 4554 4554 0 18995052544 3040 3040 active+undersized+degraded+remapped+wait_backfill
2016-12-05 15:53:58.0540892090202'516479 2090202:308795 [17,49,40] 17 [17,40] 17 2074023'511913 2016-12-04
22:51:46.035122 2074023'511913 2016-12-04 22:51:46.035122
49 is one of the OSD's I reintroduced.
Questions
1. Are these PG's really undersized?
2. Why did restarting the OSD's cause the reported degraded objects to change?
Nick
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Nick Fisk [nick@xxxxxxxxxx]
Sent: Monday, December 05, 2016 9:38 AM
To: 'ceph-users'
Subject: PG's become undersize+degraded if OSD's restart during backfill
Hi,
I had recently re-added some old OSD's by zapping them and reintroducing them into cluster as new OSD's. I'm using Ansible to add
the OSD's and because there was an outstanding config change, it restarted all OSD's on the host where I was adding the OSD's at the
end of the play.
I noticed something a bit strange when this happened. First here is the ceph.log just before the restart happened
2016-12-05 15:53:49.234039 mon.0 10.1.2.71:6789/0 5394558 : cluster [INF] pgmap v11064938: 4352 pgs: 1 active+remapped+backfilling,
9 active+clean+scrubbing, 437 active+remapped+wait_backfill, 3 active+clean+scrubbing+deep, 3902 active+clean; 70290 GB data, 207 TB
used, 114 TB / 321 TB avail; 1796 kB/s rd, 449 B/s wr, 0 op/s; 112/58482219 objects degraded (0.000%); 5751145/58482219 objects
misplaced (9.834%); 150 MB/s, 37 objects/s recovering
2016-12-05 15:53:49.436154 mon.0 10.1.2.71:6789/0 5394559 : cluster [INF] osd.50 marked itself down
2016-12-05 15:53:49.436334 mon.0 10.1.2.71:6789/0 5394560 : cluster [INF] osd.59 marked itself down
2016-12-05 15:53:49.436555 mon.0 10.1.2.71:6789/0 5394561 : cluster [INF] osd.56 marked itself down
2016-12-05 15:53:49.437382 mon.0 10.1.2.71:6789/0 5394562 : cluster [INF] osd.54 marked itself down
2016-12-05 15:53:49.437560 mon.0 10.1.2.71:6789/0 5394563 : cluster [INF] osd.57 marked itself down
2016-12-05 15:53:49.437650 mon.0 10.1.2.71:6789/0 5394564 : cluster [INF] osd.52 marked itself down
2016-12-05 15:53:49.438224 mon.0 10.1.2.71:6789/0 5394565 : cluster [INF] osd.58 marked itself down
2016-12-05 15:53:49.438599 mon.0 10.1.2.71:6789/0 5394566 : cluster [INF] osd.49 marked itself down
2016-12-05 15:53:49.438717 mon.0 10.1.2.71:6789/0 5394567 : cluster [INF] osd.55 marked itself down
2016-12-05 15:53:49.439303 mon.0 10.1.2.71:6789/0 5394568 : cluster [INF] osd.48 marked itself down
2016-12-05 15:53:49.439399 mon.0 10.1.2.71:6789/0 5394569 : cluster [INF] osd.51 marked itself down
2016-12-05 15:53:49.439710 mon.0 10.1.2.71:6789/0 5394570 : cluster [INF] osd.53 marked itself down
2016-12-05 15:53:49.966611 mon.0 10.1.2.71:6789/0 5394571 : cluster [INF] osdmap e2089614: 60 osds: 47 up, 59 in
You can see there was only a handful of degraded objects and no undersized ones.
During the period between the down and up of OSD's, lots of messages like these:
2016-12-05 15:53:57.924148 osd.15 10.1.111.2:6800/4024 2734 : cluster [INF] 1.c58 continuing backfill to osd.49 from
(2060554'272739,2089547'275741] MIN to 208954
7'275741
2016-12-05 15:53:58.569751 osd.5 10.1.111.1:6866/1787721 4143 : cluster [INF] 1.bc6 continuing backfill to osd.10 from
(2049483'242150,2089562'245156] MIN to 2089
562'245156
2016-12-05 15:53:58.569760 osd.5 10.1.111.1:6866/1787721 4144 : cluster [INF] 1.bc6 continuing backfill to osd.28 from
(2049483'242150,2089562'245156] MIN to 2089
562'245156
2016-12-05 15:53:58.569827 osd.5 10.1.111.1:6866/1787721 4145 : cluster [INF] 1.bc6 continuing backfill to osd.50 from
(2049490'242155,2089562'245156] MIN to 2089
562'245156
2016-12-05 15:53:58.569883 osd.5 10.1.111.1:6866/1787721 4146 : cluster [INF] 1.add continuing backfill to osd.19 from
(2064881'215318,2089235'218318] MIN to 2089
235'218318
2016-12-05 15:53:58.569933 osd.5 10.1.111.1:6866/1787721 4147 : cluster [INF] 1.add continuing backfill to osd.40 from
(2064881'215318,2089235'218318] MIN to 2089
235'218318
2016-12-05 15:53:58.570026 osd.5 10.1.111.1:6866/1787721 4148 : cluster [INF] 1.add continuing backfill to osd.50 from
(2064881'215318,2089235'218318] MIN to 2089
235'218318
2016-12-05 15:53:58.570816 osd.5 10.1.111.1:6866/1787721 4149 : cluster [INF] 1.cc9 continuing backfill to osd.50 from
(2067348'326147,2089547'329149] MIN to 2089
547'329149
After the restart:
2016-12-05 15:54:30.983780 mon.0 10.1.2.71:6789/0 5394697 : cluster [INF] pgmap v11064986: 4352 pgs: 1
active+undersized+degraded+remapped+backfilling, 5 active+degraded, 142 active+recovery_wait+degraded, 1 remapped+peering, 191
active+undersized+degraded+remapped+wait_backfill, 11 active+clean+scrubbing, 246 active+remapped+wait_backfill, 3755 active+clean;
70290 GB data, 207 TB used, 114 TB / 321 TB avail; 74496 kB/s rd, 179 kB/s wr, 57 op/s; 819165/57663296 objects degraded (1.421%);
4930596/57663296 objects misplaced (8.551%); 5950 kB/s, 6 objects/s recovering
I have lots of undersized PG's and quite a lot of degraded objects. However, I do notice that the degraded objects number will not
move and then suddenly jump down by several thousand. Almost like as the PG is completed backfilling, all the objects are marked as
healthy again.
And an example PG dump
1.e8c 4554 0 4554 4554 0 18995052544 3040 3040 active+undersized+degraded+remapped+wait_backfill
2016-12-05 15:53:58.0540892090202'516479 2090202:308795 [17,49,40] 17 [17,40] 17 2074023'511913 2016-12-04
22:51:46.035122 2074023'511913 2016-12-04 22:51:46.035122
49 is one of the OSD's I reintroduced.
Questions
1. Are these PG's really undersized?
2. Why did restarting the OSD's cause the reported degraded objects to change?
Nick
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com