> pg 9.7 is stuck unclean for 512936.160212, current state active+remapped, last acting [7,3,0] > pg 7.84 is stuck unclean for 512623.894574, current state active+remapped, last acting [4,8,1] > pg 8.1b is stuck unclean for 513164.616377, current state active+remapped, last acting [4,7,2] > pg 7.7a is stuck unclean for 513162.316328, current state active+remapped, last acting [7,4,2] Please execute: for pg in 9.7 7.84 8.1b 7.7a;do ceph pg $pg query; done Regards, On Tue, Jan 10, 2017 at 7:31 AM, Christian Wuerdig <christian.wuerdig@xxxxxxxxx> wrote: > > > On Tue, Jan 10, 2017 at 10:22 AM, Marcus Müller <mueller.marcus@xxxxxxxxx> > wrote: >> >> Trying google with "ceph pg stuck in active and remapped" points to a >> couple of post on this ML typically indicating that it's a problem with the >> CRUSH map and ceph being unable to satisfy the mapping rules. Your ceph -s >> output indicates that your using replication of size 3 in your pools. You >> also said you had a custom CRUSH map - can you post it? >> >> >> I’ve sent the file to you, since I’m not sure if it contains sensitive >> data. Yes I have replication of 3 and I did not customize the map by me. > > > I received your map but I'm not familiar enough with the details to give any > particular advise on this - I just suggested to post your map in case > someone more familiar with the CRUSH details might be able to spot > something. Brad just provided a pointer so that would be useful to try. > >> >> >> >> I might be missing something here but I don't quite see how you come to >> this statement. ceph osd df and ceph -s both show 16093 GB used and 39779 GB >> out of 55872 GB available. The sum of the first 3 OSDs used space is, as you >> stated, 6181 GB which is approx 38.4% so quite close to your target of 33% >> >> >> Maybe I have to explain it another way: >> >> Directly after finishing the backfill I received this output: >> >> health HEALTH_WARN >> 4 pgs stuck unclean >> recovery 1698/58476648 objects degraded (0.003%) >> recovery 418137/58476648 objects misplaced (0.715%) >> noscrub,nodeep-scrub flag(s) set >> monmap e9: 5 mons at >> {ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0,ceph5=192.168.60.11:6789/0} >> election epoch 464, quorum 0,1,2,3,4 >> ceph1,ceph2,ceph3,ceph4,ceph5 >> osdmap e3086: 9 osds: 9 up, 9 in; 4 remapped pgs >> flags noscrub,nodeep-scrub >> pgmap v9928160: 320 pgs, 3 pools, 4809 GB data, 19035 kobjects >> 16093 GB used, 39779 GB / 55872 GB avail >> 1698/58476648 objects degraded (0.003%) >> 418137/58476648 objects misplaced (0.715%) >> 316 active+clean >> 4 active+remapped >> client io 757 kB/s rd, 1 op/s >> >> # ceph osd df >> ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR >> 0 1.28899 1.00000 3724G 1924G 1799G 51.67 1.79 >> 1 1.57899 1.00000 3724G 2143G 1580G 57.57 2.00 >> 2 1.68900 1.00000 3724G 2114G 1609G 56.78 1.97 >> 3 6.78499 1.00000 7450G 1234G 6215G 16.57 0.58 >> 4 8.39999 1.00000 7450G 1221G 6228G 16.40 0.57 >> 5 9.51500 1.00000 7450G 1232G 6217G 16.54 0.57 >> 6 7.66499 1.00000 7450G 1258G 6191G 16.89 0.59 >> 7 9.75499 1.00000 7450G 2482G 4967G 33.33 1.16 >> 8 9.32999 1.00000 7450G 2480G 4969G 33.30 1.16 >> TOTAL 55872G 16093G 39779G 28.80 >> MIN/MAX VAR: 0.57/2.00 STDDEV: 17.54 >> >> Here we can see, that the cluster is using 4809 GB data and has raw used >> 16093GB. Or the other way, only 39779G available. >> >> Two days later I saw: >> >> health HEALTH_WARN >> 4 pgs stuck unclean >> recovery 3486/58726035 objects degraded (0.006%) >> recovery 420024/58726035 objects misplaced (0.715%) >> noscrub,nodeep-scrub flag(s) set >> monmap e9: 5 mons at >> {ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0,ceph5=192.168.60.11:6789/0} >> election epoch 478, quorum 0,1,2,3,4 >> ceph1,ceph2,ceph3,ceph4,ceph5 >> osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs >> flags noscrub,nodeep-scrub >> pgmap v9969059: 320 pgs, 3 pools, 4830 GB data, 19116 kobjects >> 15150 GB used, 40722 GB / 55872 GB avail >> 3486/58726035 objects degraded (0.006%) >> 420024/58726035 objects misplaced (0.715%) >> 316 active+clean >> 4 active+remapped >> >> # ceph osd df >> ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR >> 0 1.28899 1.00000 3724G 1696G 2027G 45.56 1.68 >> 1 1.57899 1.00000 3724G 1705G 2018G 45.80 1.69 >> 2 1.68900 1.00000 3724G 1794G 1929G 48.19 1.78 >> 3 6.78499 1.00000 7450G 1239G 6210G 16.64 0.61 >> 4 8.39999 1.00000 7450G 1226G 6223G 16.46 0.61 >> 5 9.51500 1.00000 7450G 1237G 6212G 16.61 0.61 >> 6 7.66499 1.00000 7450G 1263G 6186G 16.96 0.63 >> 7 9.75499 1.00000 7450G 2493G 4956G 33.47 1.23 >> 8 9.32999 1.00000 7450G 2491G 4958G 33.44 1.23 >> TOTAL 55872G 15150G 40722G 27.12 >> MIN/MAX VAR: 0.61/1.78 STDDEV: 13.54 >> >> >> As you can see now, we are using 4830 GB data BUT raw used is only 15150 >> GB or as said the other way, we have now 40722 GB free. You can see the >> change on the %USE of the osds. For me this looks like there is some data >> lost, since ceph did not do any backfill or other operation. That’s the >> problem... >> > > Ok that output is indeed a bit different. However as you should note the > actual data stored in the cluster goes from 4809 to 4830 GB. 4830 * 3 is > actually only 14490 GB so currently it's using a bit more space than > strictly necessary. My guess would be that the data gets migrated first to > the new OSDs before being deleted from the old OSD and as such it will > transiently use up more space. Pretty sure that you didn't loose any data. > >> >> >> Am 09.01.2017 um 21:55 schrieb Christian Wuerdig >> <christian.wuerdig@xxxxxxxxx>: >> >> >> >> On Tue, Jan 10, 2017 at 8:23 AM, Marcus Müller <mueller.marcus@xxxxxxxxx> >> wrote: >>> >>> Hi all, >>> >>> Recently I added a new node with new osds to my cluster, which, of course >>> resulted in backfilling. At the end, there are 4 pgs left in the state 4 >>> active+remapped and I don’t know what to do. >>> >>> Here is how my cluster looks like currently: >>> >>> ceph -s >>> health HEALTH_WARN >>> 4 pgs stuck unclean >>> recovery 3586/58734009 objects degraded (0.006%) >>> recovery 420074/58734009 objects misplaced (0.715%) >>> noscrub,nodeep-scrub flag(s) set >>> monmap e9: 5 mons at >>> {ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0,ceph5=192.168.60.11:6789/0} >>> election epoch 478, quorum 0,1,2,3,4 >>> ceph1,ceph2,ceph3,ceph4,ceph5 >>> osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs >>> flags noscrub,nodeep-scrub >>> pgmap v9970276: 320 pgs, 3 pools, 4831 GB data, 19119 kobjects >>> 15152 GB used, 40719 GB / 55872 GB avail >>> 3586/58734009 objects degraded (0.006%) >>> 420074/58734009 objects misplaced (0.715%) >>> 316 active+clean >>> 4 active+remapped >>> client io 643 kB/s rd, 7 op/s >>> >>> # ceph osd df >>> ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR >>> 0 1.28899 1.00000 3724G 1697G 2027G 45.57 1.68 >>> 1 1.57899 1.00000 3724G 1706G 2018G 45.81 1.69 >>> 2 1.68900 1.00000 3724G 1794G 1929G 48.19 1.78 >>> 3 6.78499 1.00000 7450G 1240G 6209G 16.65 0.61 >>> 4 8.39999 1.00000 7450G 1226G 6223G 16.47 0.61 >>> 5 9.51500 1.00000 7450G 1237G 6212G 16.62 0.61 >>> 6 7.66499 1.00000 7450G 1264G 6186G 16.97 0.63 >>> 7 9.75499 1.00000 7450G 2494G 4955G 33.48 1.23 >>> 8 9.32999 1.00000 7450G 2491G 4958G 33.45 1.23 >>> TOTAL 55872G 15152G 40719G 27.12 >>> MIN/MAX VAR: 0.61/1.78 STDDEV: 13.54 >>> >>> # ceph health detail >>> HEALTH_WARN 4 pgs stuck unclean; recovery 3586/58734015 objects degraded >>> (0.006%); recovery 420074/58734015 objects misplaced (0.715%); >>> noscrub,nodeep-scrub flag(s) set >>> pg 9.7 is stuck unclean for 512936.160212, current state active+remapped, >>> last acting [7,3,0] >>> pg 7.84 is stuck unclean for 512623.894574, current state >>> active+remapped, last acting [4,8,1] >>> pg 8.1b is stuck unclean for 513164.616377, current state >>> active+remapped, last acting [4,7,2] >>> pg 7.7a is stuck unclean for 513162.316328, current state >>> active+remapped, last acting [7,4,2] >>> recovery 3586/58734015 objects degraded (0.006%) >>> recovery 420074/58734015 objects misplaced (0.715%) >>> noscrub,nodeep-scrub flag(s) set >>> >>> # ceph osd tree >>> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY >>> -1 56.00693 root default >>> -2 1.28899 host ceph1 >>> 0 1.28899 osd.0 up 1.00000 1.00000 >>> -3 1.57899 host ceph2 >>> 1 1.57899 osd.1 up 1.00000 1.00000 >>> -4 1.68900 host ceph3 >>> 2 1.68900 osd.2 up 1.00000 1.00000 >>> -5 32.36497 host ceph4 >>> 3 6.78499 osd.3 up 1.00000 1.00000 >>> 4 8.39999 osd.4 up 1.00000 1.00000 >>> 5 9.51500 osd.5 up 1.00000 1.00000 >>> 6 7.66499 osd.6 up 1.00000 1.00000 >>> -6 19.08498 host ceph5 >>> 7 9.75499 osd.7 up 1.00000 1.00000 >>> 8 9.32999 osd.8 up 1.00000 1.00000 >>> >>> I’m using a customized crushmap because as you can see this cluster is >>> not very optimal. Ceph1, ceph2 and ceph3 are vms on one physical host - >>> Ceph4 and Ceph5 are both separate physical hosts. So the idea is to spread >>> 33% of the data to ceph1, ceph2 and ceph3 and the other 66% to each ceph4 >>> and ceph5. >>> >>> Everything went fine with the backfilling but now I see those 4 pgs stuck >>> active+remapped since 2 days while the degrades objects increase. >>> >>> I did a restart of all osds after and after but this helped not really. >>> It first showed me no degraded objects and then increased again. >>> >>> What can I do in order to get those pgs to active+clean state again? My >>> idea was to increase the weight of a osd a little bit in order to let ceph >>> calculate the map again, is this a good idea? >> >> >> Trying google with "ceph pg stuck in active and remapped" points to a >> couple of post on this ML typically indicating that it's a problem with the >> CRUSH map and ceph being unable to satisfy the mapping rules. Your ceph -s >> output indicates that your using replication of size 3 in your pools. You >> also said you had a custom CRUSH map - can you post it? >> >>> >>> >>> --- >>> >>> On the other side I saw something very strange too: After the backfill >>> was done (2 days ago), my ceph osd df looked like this: >>> >>> # ceph osd df >>> ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR >>> 0 1.28899 1.00000 3724G 1924G 1799G 51.67 1.79 >>> 1 1.57899 1.00000 3724G 2143G 1580G 57.57 2.00 >>> 2 1.68900 1.00000 3724G 2114G 1609G 56.78 1.97 >>> 3 6.78499 1.00000 7450G 1234G 6215G 16.57 0.58 >>> 4 8.39999 1.00000 7450G 1221G 6228G 16.40 0.57 >>> 5 9.51500 1.00000 7450G 1232G 6217G 16.54 0.57 >>> 6 7.66499 1.00000 7450G 1258G 6191G 16.89 0.59 >>> 7 9.75499 1.00000 7450G 2482G 4967G 33.33 1.16 >>> 8 9.32999 1.00000 7450G 2480G 4969G 33.30 1.16 >>> TOTAL 55872G 16093G 39779G 28.80 >>> MIN/MAX VAR: 0.57/2.00 STDDEV: 17.54 >>> >>> While ceph -s was: >>> >>> health HEALTH_WARN >>> 4 pgs stuck unclean >>> recovery 1698/58476648 objects degraded (0.003%) >>> recovery 418137/58476648 objects misplaced (0.715%) >>> noscrub,nodeep-scrub flag(s) set >>> monmap e9: 5 mons at >>> {ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0,ceph5=192.168.60.11:6789/0} >>> election epoch 464, quorum 0,1,2,3,4 >>> ceph1,ceph2,ceph3,ceph4,ceph5 >>> osdmap e3086: 9 osds: 9 up, 9 in; 4 remapped pgs >>> flags noscrub,nodeep-scrub >>> pgmap v9928160: 320 pgs, 3 pools, 4809 GB data, 19035 kobjects >>> 16093 GB used, 39779 GB / 55872 GB avail >>> 1698/58476648 objects degraded (0.003%) >>> 418137/58476648 objects misplaced (0.715%) >>> 316 active+clean >>> 4 active+remapped >>> client io 757 kB/s rd, 1 op/s >>> >>> >>> As you can see above my ceph osd df looks completely different -> This >>> shows that the first three osds lost data (about 1 TB) without any backfill >>> going on. If I calculate the amount of osd0, osd1 and osd2 it was 6181 GB. >>> But there should be only around 33%, so this would be wrong. >> >> >> I might be missing something here but I don't quite see how you come to >> this statement. ceph osd df and ceph -s both show 16093 GB used and 39779 GB >> out of 55872 GB available. The sum of the first 3 OSDs used space is, as you >> stated, 6181 GB which is approx 38.4% so quite close to your target of 33% >> >>> >>> >>> My question on this is: Is this a bug and I really lost important data or >>> is this a ceph cleanup action after the backfill? >>> >>> Thanks and regards, >>> Marcus >>> >>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com