Hello, On Thu, 1 Sep 2016 11:20:33 +0200 Ishmael Tsoaela wrote: > thanks for the response > > > > > You really will want to spend more time reading documentation and this ML, > > as well as using google to (re-)search things. > > > I did do some reading on the error but cannot understand why they do > not clear even after so long. > > > In your previous mail you already mentioned a 92% full OSD, that should > > combined with the various "full" warnings have impressed on you the need > > to address this issue. > > > When your nodes all rebooted, did everything come back up? > > One host with 5 osd were down nad came up later. > > > And if so (as the 15 osds: 15 up, 15 in suggest), how much separated in > time? > > about 7 hours > > > And if so (as the 15 osds: 15 up, 15 in suggest), how much separated in > time? about 7 hours > OK, so in that 7 hours (with 1/3rd of your cluster down), Ceph tried to restore redundancy, but had not enough space to do so and got itself stuck in a corner. Lesson here is: a) have enough space to cover the loss of one node (rack, etc) or b) set "mon_osd_down_out_subtree_limit = host" in your case, so that you can recover a failed node before re-balancing starts. Of course b) assumes that you have 24/7 monitoring and access to your cluster, so that restoring a failed node is likely faster that re-balancing the data. > True > > > Bad, Ceph wants to place data onto these 2 PGs, but their OSDs are too > > full for that. > > And until something changes it will be stuck there. > > Your best bet is to add more OSDs, since you seem to be short on space > > anyway. Or delete unneeded data. > > Given your level of experience, I'd advice against playing with weights > > and the respective "full" configuration options. > > I did reweights some osd but everything is back to normal. No config > changes on "Full" config > > I deleted about 900G this morning and prepared 3 osd, should I add them now? > More OSDs will both make things less likely to get full again and give the nearfull OSDs a place to move data to. However they will also cause more data movement, so if your cluster is busy, maybe do that during the night or weekend. > > Are these numbers and the recovery io below still changing, moving along? > > original email: > > > recovery 493335/3099981 objects degraded (15.914%) > > recovery 1377464/3099981 objects misplaced (44.435%) > > > current email: > > > recovery 389973/3096070 objects degraded (12.596%) > recovery 1258984/3096070 objects misplaced (40.664%) > > So there is progress, it may recover by itself after all. Looking at your "df" output only 7 OSDs seem to be nearfull now, is that correct? If so definitely progress, it's just taking a lot of time to recover. If the progress should stop before the cluster can get healthy again, write another mail with "ceph -s" and so forth for us to peruse. Christian > > Just to confirm, that's all the 15 OSDs your cluster ever had? > > yes > > > > Output from "ceph osd df" and "ceph osd tree" please. > > ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS > 3 0.90868 1.00000 930G 232G 698G 24.96 0.40 105 > 5 0.90868 1.00000 930G 139G 791G 14.99 0.24 139 > 6 0.90868 1.00000 930G 61830M 870G 6.49 0.10 138 > 0 0.90868 1.00000 930G 304G 625G 32.76 0.53 128 > 2 0.90868 1.00000 930G 24253M 906G 2.55 0.04 130 > 1 0.90868 1.00000 930G 793G 137G 85.22 1.37 162 > 4 0.90868 1.00000 930G 790G 140G 84.91 1.36 160 > 7 0.90868 1.00000 930G 803G 127G 86.34 1.39 144 > 10 0.90868 1.00000 930G 792G 138G 85.16 1.37 145 > 13 0.90868 1.00000 930G 811G 119G 87.17 1.40 163 > 15 0.90869 1.00000 930G 794G 136G 85.37 1.37 157 > 16 0.90869 1.00000 930G 757G 172G 81.45 1.31 159 > 17 0.90868 1.00000 930G 800G 129G 86.06 1.38 144 > 18 0.90869 1.00000 930G 786G 144G 84.47 1.36 166 > 19 0.90868 1.00000 930G 793G 137G 85.26 1.37 160 > TOTAL 13958G 8683G 5274G 62.21 > MIN/MAX VAR: 0.04/1.40 STDDEV: 33.10 > > > > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -1 13.63019 root default > -2 4.54338 host nodeB > 3 0.90868 osd.3 up 1.00000 1.00000 > 5 0.90868 osd.5 up 1.00000 1.00000 > 6 0.90868 osd.6 up 1.00000 1.00000 > 0 0.90868 osd.0 up 1.00000 1.00000 > 2 0.90868 osd.2 up 1.00000 1.00000 > -3 4.54338 host nodeC > 1 0.90868 osd.1 up 1.00000 1.00000 > 4 0.90868 osd.4 up 1.00000 1.00000 > 7 0.90868 osd.7 up 1.00000 1.00000 > 10 0.90868 osd.10 up 1.00000 1.00000 > 13 0.90868 osd.13 up 1.00000 1.00000 > -6 4.54343 host nodeD > 15 0.90869 osd.15 up 1.00000 1.00000 > 16 0.90869 osd.16 up 1.00000 1.00000 > 17 0.90868 osd.17 up 1.00000 1.00000 > 18 0.90869 osd.18 up 1.00000 1.00000 > 19 0.90868 osd.19 up 1.00000 1.00000 > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 1, 2016 at 10:56 AM, Christian Balzer <chibi@xxxxxxx> wrote: > > > > > > Hello, > > > > On Thu, 1 Sep 2016 10:18:39 +0200 Ishmael Tsoaela wrote: > > > > > Hi All, > > > > > > Can someone please decipher this errors for me, after all nodes rebooted in > > > my cluster on Monday. the warning has not gone. > > > > > You really will want to spend more time reading documentation and this ML, > > as well as using google to (re-)search things. > > Like searching for "backfill_toofull", "near full", etc. > > > > > > > Will the warning ever clear? > > > > > Unlikely. > > > > In your previous mail you already mentioned a 92% full OSD, that should > > combined with the various "full" warnings have impressed on you the need > > to address this issue. > > > > When your nodes all rebooted, did everything come back up? > > And if so (as the 15 osds: 15 up, 15 in suggest), how much separated in > > time? > > My guess is that some nodes/OSDs where restarted a lot later than others. > > > > See inline: > > > > > > cluster df3f96d8-3889-4baa-8b27-cc2839141425 > > > health HEALTH_WARN > > > 2 pgs backfill_toofull > > Bad, Ceph wants to place data onto these 2 PGs, but their OSDs are too > > full for that. > > And until something changes it will be stuck there. > > > > Your best bet is to add more OSDs, since you seem to be short on space > > anyway. Or delete unneeded data. > > Given your level of experience, I'd advice against playing with weights > > and the respective "full" configuration options. > > > > > 532 pgs backfill_wait > > > 3 pgs backfilling > > > 330 pgs degraded > > > 537 pgs stuck unclean > > > 330 pgs undersized > > > recovery 493335/3099981 objects degraded (15.914%) > > > recovery 1377464/3099981 objects misplaced (44.435%) > > Are these numbers and the recovery io below still changing, moving along? > > > > > 8 near full osd(s) > > 8 out of 15, definitely needs more OSD. > > Output from "ceph osd df" and "ceph osd tree" please. > > > > > monmap e7: 3 mons at {Monitors} > > > election epoch 118, quorum 0,1,2 nodeB,nodeC,nodeD > > > osdmap e3922: 15 osds: 15 up, 15 in; 537 remapped pgs > > > > Just to confirm, that's all the 15 OSDs your cluster ever had? > > > > Christian > > > > > flags sortbitwise > > > pgmap v2431741: 640 pgs, 3 pools, 3338 GB data, 864 kobjects > > > 8242 GB used, 5715 GB / 13958 GB avail > > > 493335/3099981 objects degraded (15.914%) > > > 1377464/3099981 objects misplaced (44.435%) > > > 327 active+undersized+degraded+remapped+wait_backfill > > > 205 active+remapped+wait_backfill > > > 103 active+clean > > > 3 active+undersized+degraded+remapped+backfilling > > > 2 active+remapped+backfill_toofull > > > recovery io 367 MB/s, 96 objects/s > > > client io 5699 B/s rd, 23749 B/s wr, 2 op/s rd, 12 op/s wr > > > > > > -- > > Christian Balzer Network/Systems Engineer > > chibi@xxxxxxx Global OnLine Japan/Rakuten Communications > > http://www.gol.com/ > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com