Hi, if you need fast access to your remaining data you can use ceph-objectstore-tool to mark those PGs as complete, however this will irreversibly lose the missing data. If you understand the risks, this procedure is pretty good explained here: http://ceph.com/community/incomplete-pgs-oh-my/ Since this article was written, ceph-objectstore-tool gained a feature that was not available at that time, that is "--op mark-complete". I think it will be necessary in your case to call --op mark-complete after you import the PG to temporary OSD (between steps 12 and 13). On 29.06.2016 09:09, Mario Giammarco wrote: > Now I have also discovered that, by mistake, someone has put production > data on a virtual machine of the cluster. I need that ceph starts I/O so > I can boot that virtual machine. > Can I mark the incomplete pgs as valid? > If needed, where can I buy some paid support? > Thanks again, > Mario > > Il giorno mer 29 giu 2016 alle ore 08:02 Mario Giammarco > <mgiammarco@xxxxxxxxx <mailto:mgiammarco@xxxxxxxxx>> ha scritto: > > pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 > object_hash rjenkins pg_num 512 pgp_num 512 last_change 9313 flags > hashpspool stripe_width 0 > removed_snaps [1~3] > pool 1 'rbd2' replicated size 2 min_size 1 crush_ruleset 0 > object_hash rjenkins pg_num 512 pgp_num 512 last_change 9314 flags > hashpspool stripe_width 0 > removed_snaps [1~3] > pool 2 'rbd3' replicated size 2 min_size 1 crush_ruleset 0 > object_hash rjenkins pg_num 512 pgp_num 512 last_change 10537 flags > hashpspool stripe_width 0 > removed_snaps [1~3] > > > ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR > 5 1.81000 1.00000 1857G 984G 872G 53.00 0.86 > 6 1.81000 1.00000 1857G 1202G 655G 64.73 1.05 > 2 1.81000 1.00000 1857G 1158G 698G 62.38 1.01 > 3 1.35999 1.00000 1391G 906G 485G 65.12 1.06 > 4 0.89999 1.00000 926G 702G 223G 75.88 1.23 > 7 1.81000 1.00000 1857G 1063G 793G 57.27 0.93 > 8 1.81000 1.00000 1857G 1011G 846G 54.44 0.88 > 9 0.89999 1.00000 926G 573G 352G 61.91 1.01 > 0 1.81000 1.00000 1857G 1227G 629G 66.10 1.07 > 13 0.45000 1.00000 460G 307G 153G 66.74 1.08 > TOTAL 14846G 9136G 5710G 61.54 > MIN/MAX VAR: 0.86/1.23 STDDEV: 6.47 > > > > ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) > > http://pastebin.com/SvGfcSHb > http://pastebin.com/gYFatsNS > http://pastebin.com/VZD7j2vN > > I do not understand why I/O on ENTIRE cluster is blocked when only > few pgs are incomplete. > > Many thanks, > Mario > > > Il giorno mar 28 giu 2016 alle ore 19:34 Stefan Priebe - Profihost > AG <s.priebe@xxxxxxxxxxxx <mailto:s.priebe@xxxxxxxxxxxx>> ha scritto: > > And ceph health detail > > Stefan > > Excuse my typo sent from my mobile phone. > > Am 28.06.2016 um 19:28 schrieb Oliver Dzombic > <info@xxxxxxxxxxxxxxxxx <mailto:info@xxxxxxxxxxxxxxxxx>>: > >> Hi Mario, >> >> please give some more details: >> >> Please the output of: >> >> ceph osd pool ls detail >> ceph osd df >> ceph --version >> >> ceph -w for 10 seconds ( use http://pastebin.com/ please ) >> >> ceph osd crush dump ( also pastebin pls ) >> >> -- >> Mit freundlichen Gruessen / Best regards >> >> Oliver Dzombic >> IP-Interactive >> >> mailto:info@xxxxxxxxxxxxxxxxx >> >> Anschrift: >> >> IP Interactive UG ( haftungsbeschraenkt ) >> Zum Sonnenberg 1-3 >> 63571 Gelnhausen >> >> HRB 93402 beim Amtsgericht Hanau >> Geschäftsführung: Oliver Dzombic >> >> Steuer Nr.: 35 236 3622 1 >> UST ID: DE274086107 >> >> >> Am 28.06.2016 um 18:59 schrieb Mario Giammarco: >>> Hello, >>> this is the second time that happens to me, I hope that >>> someone can >>> explain what I can do. >>> Proxmox ceph cluster with 8 servers, 11 hdd. Min_size=1, size=2. >>> >>> One hdd goes down due to bad sectors. >>> Ceph recovers but it ends with: >>> >>> cluster f2a8dd7d-949a-4a29-acab-11d4900249f4 >>> health HEALTH_WARN >>> 3 pgs down >>> 19 pgs incomplete >>> 19 pgs stuck inactive >>> 19 pgs stuck unclean >>> 7 requests are blocked > 32 sec >>> monmap e11: 7 mons at >>> {0=192.168.0.204:6789/0,1=192.168.0.201:6789/0 >>> <http://192.168.0.204:6789/0,1=192.168.0.201:6789/0>, >>> 2=192.168.0.203:6789/0,3=192.168.0.205:6789/0,4=192.168.0.202 >>> <http://192.168.0.203:6789/0,3=192.168.0.205:6789/0,4=192.168.0.202>: >>> 6789/0,5=192.168.0.206:6789/0,6=192.168.0.207:6789/0 >>> <http://192.168.0.206:6789/0,6=192.168.0.207:6789/0>} >>> election epoch 722, quorum >>> 0,1,2,3,4,5,6 1,4,2,0,3,5,6 >>> osdmap e10182: 10 osds: 10 up, 10 in >>> pgmap v3295880: 1024 pgs, 2 pools, 4563 GB data, 1143 >>> kobjects >>> 9136 GB used, 5710 GB / 14846 GB avail >>> 1005 active+clean >>> 16 incomplete >>> 3 down+incomplete >>> >>> Unfortunately "7 requests blocked" means no virtual machine >>> can boot >>> because ceph has stopped i/o. >>> >>> I can accept to lose some data, but not ALL data! >>> Can you help me please? >>> Thanks, >>> Mario >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Tomasz Kuzemko tomasz.kuzemko@xxxxxxxxxxxx
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com