Hi Mario, in my opinion you should 1. fix too many PGs per OSD (307 > max 300) 2. stop scrubbing / deeb scrubbing -------------- How looks your current ceph osd tree ? -- Mit freundlichen Gruessen / Best regards Oliver Dzombic IP-Interactive mailto:info@xxxxxxxxxxxxxxxxx Anschrift: IP Interactive UG ( haftungsbeschraenkt ) Zum Sonnenberg 1-3 63571 Gelnhausen HRB 93402 beim Amtsgericht Hanau Geschäftsführung: Oliver Dzombic Steuer Nr.: 35 236 3622 1 UST ID: DE274086107 Am 29.06.2016 um 09:50 schrieb Mario Giammarco: > I have searched google and I see that there is no official procedure. > > Il giorno mer 29 giu 2016 alle ore 09:43 Mario Giammarco > <mgiammarco@xxxxxxxxx <mailto:mgiammarco@xxxxxxxxx>> ha scritto: > > I have read many times the post "incomplete pgs, oh my" > I think my case is different. > The broken disk is completely broken. > So how can I simply mark incomplete pgs as complete? > Should I stop ceph before? > > > Il giorno mer 29 giu 2016 alle ore 09:36 Tomasz Kuzemko > <tomasz.kuzemko@xxxxxxxxxxxx <mailto:tomasz.kuzemko@xxxxxxxxxxxx>> > ha scritto: > > Hi, > if you need fast access to your remaining data you can use > ceph-objectstore-tool to mark those PGs as complete, however > this will > irreversibly lose the missing data. > > If you understand the risks, this procedure is pretty good > explained here: > http://ceph.com/community/incomplete-pgs-oh-my/ > > Since this article was written, ceph-objectstore-tool gained a > feature > that was not available at that time, that is "--op mark-complete". I > think it will be necessary in your case to call --op > mark-complete after > you import the PG to temporary OSD (between steps 12 and 13). > > On 29.06.2016 09:09, Mario Giammarco wrote: > > Now I have also discovered that, by mistake, someone has put > production > > data on a virtual machine of the cluster. I need that ceph > starts I/O so > > I can boot that virtual machine. > > Can I mark the incomplete pgs as valid? > > If needed, where can I buy some paid support? > > Thanks again, > > Mario > > > > Il giorno mer 29 giu 2016 alle ore 08:02 Mario Giammarco > > <mgiammarco@xxxxxxxxx <mailto:mgiammarco@xxxxxxxxx> > <mailto:mgiammarco@xxxxxxxxx <mailto:mgiammarco@xxxxxxxxx>>> ha > scritto: > > > > pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 > > object_hash rjenkins pg_num 512 pgp_num 512 last_change > 9313 flags > > hashpspool stripe_width 0 > > removed_snaps [1~3] > > pool 1 'rbd2' replicated size 2 min_size 1 crush_ruleset 0 > > object_hash rjenkins pg_num 512 pgp_num 512 last_change > 9314 flags > > hashpspool stripe_width 0 > > removed_snaps [1~3] > > pool 2 'rbd3' replicated size 2 min_size 1 crush_ruleset 0 > > object_hash rjenkins pg_num 512 pgp_num 512 last_change > 10537 flags > > hashpspool stripe_width 0 > > removed_snaps [1~3] > > > > > > ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR > > 5 1.81000 1.00000 1857G 984G 872G 53.00 0.86 > > 6 1.81000 1.00000 1857G 1202G 655G 64.73 1.05 > > 2 1.81000 1.00000 1857G 1158G 698G 62.38 1.01 > > 3 1.35999 1.00000 1391G 906G 485G 65.12 1.06 > > 4 0.89999 1.00000 926G 702G 223G 75.88 1.23 > > 7 1.81000 1.00000 1857G 1063G 793G 57.27 0.93 > > 8 1.81000 1.00000 1857G 1011G 846G 54.44 0.88 > > 9 0.89999 1.00000 926G 573G 352G 61.91 1.01 > > 0 1.81000 1.00000 1857G 1227G 629G 66.10 1.07 > > 13 0.45000 1.00000 460G 307G 153G 66.74 1.08 > > TOTAL 14846G 9136G 5710G 61.54 > > MIN/MAX VAR: 0.86/1.23 STDDEV: 6.47 > > > > > > > > ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) > > > > http://pastebin.com/SvGfcSHb > > http://pastebin.com/gYFatsNS > > http://pastebin.com/VZD7j2vN > > > > I do not understand why I/O on ENTIRE cluster is blocked > when only > > few pgs are incomplete. > > > > Many thanks, > > Mario > > > > > > Il giorno mar 28 giu 2016 alle ore 19:34 Stefan Priebe - > Profihost > > AG <s.priebe@xxxxxxxxxxxx <mailto:s.priebe@xxxxxxxxxxxx> > <mailto:s.priebe@xxxxxxxxxxxx <mailto:s.priebe@xxxxxxxxxxxx>>> > ha scritto: > > > > And ceph health detail > > > > Stefan > > > > Excuse my typo sent from my mobile phone. > > > > Am 28.06.2016 um 19:28 schrieb Oliver Dzombic > > <info@xxxxxxxxxxxxxxxxx > <mailto:info@xxxxxxxxxxxxxxxxx> <mailto:info@xxxxxxxxxxxxxxxxx > <mailto:info@xxxxxxxxxxxxxxxxx>>>: > > > >> Hi Mario, > >> > >> please give some more details: > >> > >> Please the output of: > >> > >> ceph osd pool ls detail > >> ceph osd df > >> ceph --version > >> > >> ceph -w for 10 seconds ( use http://pastebin.com/ > please ) > >> > >> ceph osd crush dump ( also pastebin pls ) > >> > >> -- > >> Mit freundlichen Gruessen / Best regards > >> > >> Oliver Dzombic > >> IP-Interactive > >> > >> mailto:info@xxxxxxxxxxxxxxxxx > <mailto:info@xxxxxxxxxxxxxxxxx> > >> > >> Anschrift: > >> > >> IP Interactive UG ( haftungsbeschraenkt ) > >> Zum Sonnenberg 1-3 > >> 63571 Gelnhausen > >> > >> HRB 93402 beim Amtsgericht Hanau > >> Geschäftsführung: Oliver Dzombic > >> > >> Steuer Nr.: 35 236 3622 1 > >> UST ID: DE274086107 > >> > >> > >> Am 28.06.2016 um 18:59 schrieb Mario Giammarco: > >>> Hello, > >>> this is the second time that happens to me, I hope that > >>> someone can > >>> explain what I can do. > >>> Proxmox ceph cluster with 8 servers, 11 hdd. > Min_size=1, size=2. > >>> > >>> One hdd goes down due to bad sectors. > >>> Ceph recovers but it ends with: > >>> > >>> cluster f2a8dd7d-949a-4a29-acab-11d4900249f4 > >>> health HEALTH_WARN > >>> 3 pgs down > >>> 19 pgs incomplete > >>> 19 pgs stuck inactive > >>> 19 pgs stuck unclean > >>> 7 requests are blocked > 32 sec > >>> monmap e11: 7 mons at > >>> {0=192.168.0.204:6789/0,1=192.168.0.201:6789/0 > <http://192.168.0.204:6789/0,1=192.168.0.201:6789/0> > >>> <http://192.168.0.204:6789/0,1=192.168.0.201:6789/0>, > >>> > 2=192.168.0.203:6789/0,3=192.168.0.205:6789/0,4=192.168.0.202 > <http://192.168.0.203:6789/0,3=192.168.0.205:6789/0,4=192.168.0.202> > >>> > <http://192.168.0.203:6789/0,3=192.168.0.205:6789/0,4=192.168.0.202>: > >>> 6789/0,5=192.168.0.206:6789/0,6=192.168.0.207:6789/0 > <http://192.168.0.206:6789/0,6=192.168.0.207:6789/0> > >>> <http://192.168.0.206:6789/0,6=192.168.0.207:6789/0>} > >>> election epoch 722, quorum > >>> 0,1,2,3,4,5,6 1,4,2,0,3,5,6 > >>> osdmap e10182: 10 osds: 10 up, 10 in > >>> pgmap v3295880: 1024 pgs, 2 pools, 4563 GB > data, 1143 > >>> kobjects > >>> 9136 GB used, 5710 GB / 14846 GB avail > >>> 1005 active+clean > >>> 16 incomplete > >>> 3 down+incomplete > >>> > >>> Unfortunately "7 requests blocked" means no virtual > machine > >>> can boot > >>> because ceph has stopped i/o. > >>> > >>> I can accept to lose some data, but not ALL data! > >>> Can you help me please? > >>> Thanks, > >>> Mario > >>> > >>> _______________________________________________ > >>> ceph-users mailing list > >>> ceph-users@xxxxxxxxxxxxxx > <mailto:ceph-users@xxxxxxxxxxxxxx> > <mailto:ceph-users@xxxxxxxxxxxxxx > <mailto:ceph-users@xxxxxxxxxxxxxx>> > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>> > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@xxxxxxxxxxxxxx > <mailto:ceph-users@xxxxxxxxxxxxxx> > <mailto:ceph-users@xxxxxxxxxxxxxx > <mailto:ceph-users@xxxxxxxxxxxxxx>> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > <mailto:ceph-users@xxxxxxxxxxxxxx> > <mailto:ceph-users@xxxxxxxxxxxxxx > <mailto:ceph-users@xxxxxxxxxxxxxx>> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- > Tomasz Kuzemko > tomasz.kuzemko@xxxxxxxxxxxx <mailto:tomasz.kuzemko@xxxxxxxxxxxx> > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com