please try with: ceph pg repair <pgid> most of the time will be good! good luck! > 在 2016年9月26日,下午10:44,Eugen Block <eblock@xxxxxx> 写道: > > (Sorry, sometimes I use the wrong shortcuts too quick) > > Hi experts, > > I need your help. I have a running cluster with 19 OSDs and 3 MONs. I created a separate LVM for /var/lib/ceph on one of the nodes. I stopped the mon service on that node, rsynced the content to the newly created LVM and restarted the monitor, but obviously, I didn't do that correctly as I'm stuck in ERROR state and can't repair the respective PGs. > How would I do that correctly? I want to do the same on the remaining nodes, but without bringing the cluster to error state. > > One thing I already learned is to set the noout flag before stopping services, but what else is there to do to accomplish that? > > But now that it is in error state, how can I repair my cluster? the current status is: > > ---cut here--- > ceph@node01:~/ceph-deploy> ceph -s > cluster 655cb05a-435a-41ba-83d9-8549f7c36167 > health HEALTH_ERR > 16 pgs inconsistent > 261 scrub errors > monmap e7: 3 mons at {mon1=192.168.160.15:6789/0,mon2=192.168.160.17:6789/0,mon3=192.168.160.16:6789/0} > election epoch 356, quorum 0,1,2 mon1,mon2,mon3 > osdmap e3394: 19 osds: 19 up, 19 in > pgmap v7105355: 8432 pgs, 15 pools, 1003 GB data, 205 kobjects > 2114 GB used, 6038 GB / 8153 GB avail > 8413 active+clean > 16 active+clean+inconsistent > 3 active+clean+scrubbing+deep > client io 0 B/s rd, 136 kB/s wr, 34 op/s > > ceph@ndesan01:~/ceph-deploy> ceph health detail > HEALTH_ERR 16 pgs inconsistent; 261 scrub errors > pg 1.ffa is active+clean+inconsistent, acting [16,5] > pg 1.cc9 is active+clean+inconsistent, acting [5,18] > pg 1.bb1 is active+clean+inconsistent, acting [15,5] > pg 1.ac4 is active+clean+inconsistent, acting [0,5] > pg 1.a46 is active+clean+inconsistent, acting [13,4] > pg 1.a16 is active+clean+inconsistent, acting [5,18] > pg 1.9e4 is active+clean+inconsistent, acting [13,9] > pg 1.9b7 is active+clean+inconsistent, acting [5,6] > pg 1.950 is active+clean+inconsistent, acting [0,9] > pg 1.6db is active+clean+inconsistent, acting [15,5] > pg 1.5f6 is active+clean+inconsistent, acting [17,5] > pg 1.5c2 is active+clean+inconsistent, acting [8,4] > pg 1.5bc is active+clean+inconsistent, acting [9,6] > pg 1.505 is active+clean+inconsistent, acting [16,9] > pg 1.3e6 is active+clean+inconsistent, acting [2,4] > pg 1.32 is active+clean+inconsistent, acting [18,5] > 261 scrub errors > ---cut here--- > > And the number of scrub errors is increasing, although I started with more thatn 400 scrub errors. > What I have tried is to manually repair single PGs as described in [1]. But some of the broken PGs have no entries in the log file so I don't have anything to look at. > In case there is one object in one OSD but is missing in the other. how do I get that copied back there? Everything I've tried so far didn't accomplish anything except the decreasing number of scrub errors, but they are increasing again, so no success at all. > > I'd be really greatful for your advice! > > Regards, > Eugen > > [1] http://ceph.com/planet/ceph-manually-repair-object/ > > -- > Eugen Block voice : +49-40-559 51 75 > NDE Netzdesign und -entwicklung AG fax : +49-40-559 51 77 > Postfach 61 03 15 > D-22423 Hamburg e-mail : eblock@xxxxxx > > Vorsitzende des Aufsichtsrates: Angelika Mozdzen > Sitz und Registergericht: Hamburg, HRB 90934 > Vorstand: Jens-U. Mozdzen > USt-IdNr. DE 814 013 983 > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com