I’ve sent the file to you, since I’m not sure if it contains sensitive data. Yes I have replication of 3 and I did not customize the map by me.
Directly after finishing the backfill I received this output: health HEALTH_WARN 4 pgs stuck unclean recovery 1698/58476648 objects degraded (0.003%) recovery 418137/58476648 objects misplaced (0.715%) noscrub,nodeep-scrub flag(s) set monmap e9: 5 mons at {ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0,ceph5=192.168.60.11:6789/0} election epoch 464, quorum 0,1,2,3,4 ceph1,ceph2,ceph3,ceph4,ceph5 osdmap e3086: 9 osds: 9 up, 9 in; 4 remapped pgs flags noscrub,nodeep-scrub pgmap v9928160: 320 pgs, 3 pools, 4809 GB data, 19035 kobjects 16093 GB used, 39779 GB / 55872 GB avail 1698/58476648 objects degraded (0.003%) 418137/58476648 objects misplaced (0.715%) 316 active+clean 4 active+remapped client io 757 kB/s rd, 1 op/s # ceph osd df ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR 0 1.28899 1.00000 3724G 1924G 1799G 51.67 1.79 1 1.57899 1.00000 3724G 2143G 1580G 57.57 2.00 2 1.68900 1.00000 3724G 2114G 1609G 56.78 1.97 3 6.78499 1.00000 7450G 1234G 6215G 16.57 0.58 4 8.39999 1.00000 7450G 1221G 6228G 16.40 0.57 5 9.51500 1.00000 7450G 1232G 6217G 16.54 0.57 6 7.66499 1.00000 7450G 1258G 6191G 16.89 0.59 7 9.75499 1.00000 7450G 2482G 4967G 33.33 1.16 8 9.32999 1.00000 7450G 2480G 4969G 33.30 1.16 TOTAL 55872G 16093G 39779G 28.80 MIN/MAX VAR: 0.57/2.00 STDDEV: 17.54 Here we can see, that the cluster is using 4809 GB data and has raw used 16093GB. Or the other way, only 39779G available. Two days later I saw: health HEALTH_WARN 4 pgs stuck unclean recovery 3486/58726035 objects degraded (0.006%) recovery 420024/58726035 objects misplaced (0.715%) noscrub,nodeep-scrub flag(s) set monmap e9: 5 mons at {ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0,ceph5=192.168.60.11:6789/0} election epoch 478, quorum 0,1,2,3,4 ceph1,ceph2,ceph3,ceph4,ceph5 osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs flags noscrub,nodeep-scrub pgmap v9969059: 320 pgs, 3 pools, 4830 GB data, 19116 kobjects 15150 GB used, 40722 GB / 55872 GB avail 3486/58726035 objects degraded (0.006%) 420024/58726035 objects misplaced (0.715%) 316 active+clean 4 active+remapped # ceph osd df ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR 0 1.28899 1.00000 3724G 1696G 2027G 45.56 1.68 1 1.57899 1.00000 3724G 1705G 2018G 45.80 1.69 2 1.68900 1.00000 3724G 1794G 1929G 48.19 1.78 3 6.78499 1.00000 7450G 1239G 6210G 16.64 0.61 4 8.39999 1.00000 7450G 1226G 6223G 16.46 0.61 5 9.51500 1.00000 7450G 1237G 6212G 16.61 0.61 6 7.66499 1.00000 7450G 1263G 6186G 16.96 0.63 7 9.75499 1.00000 7450G 2493G 4956G 33.47 1.23 8 9.32999 1.00000 7450G 2491G 4958G 33.44 1.23 TOTAL 55872G 15150G 40722G 27.12 MIN/MAX VAR: 0.61/1.78 STDDEV: 13.54 As you can see now, we are using 4830 GB data BUT raw used is only 15150 GB or as said the other way, we have now 40722 GB free. You can see the change on the %USE of the osds. For me this looks like there is some data lost, since ceph did not do any backfill or other operation. That’s the problem...
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com