Thanks,
I can put in osds but the do not stay in, and I am pretty sure that are not broken.
Il giorno mer 29 giu 2016 alle ore 12:07 Oliver Dzombic <info@xxxxxxxxxxxxxxxxx> ha scritto:
hi,
ceph osd set noscrub
ceph osd set nodeep-scrub
ceph osd in <id>
--
Mit freundlichen Gruessen / Best regards
Oliver Dzombic
IP-Interactive
mailto:info@xxxxxxxxxxxxxxxxx
Anschrift:
IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen
HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic
Steuer Nr.: 35 236 3622 1
UST ID: DE274086107
Am 29.06.2016 um 12:00 schrieb Mario Giammarco:
> Now the problem is that ceph has put out two disks because scrub has
> failed (I think it is not a disk fault but due to mark-complete)
> How can I:
> - disable scrub
> - put in again the two disks
>
> I will wait anyway the end of recovery to be sure it really works again
>
> Il giorno mer 29 giu 2016 alle ore 11:16 Mario Giammarco
> <mgiammarco@xxxxxxxxx <mailto:mgiammarco@xxxxxxxxx>> ha scritto:
>
> Infact I am worried because:
>
> 1) ceph is under proxmox, and proxmox may decide to reboot a server
> if it is not responding
> 2) probably a server was rebooted while ceph was reconstructing
> 3) even using max=3 do not help
>
> Anyway this is the "unofficial" procedure that I am using, much
> simpler than blog post:
>
> 1) find host where is pg
> 2) stop ceph in that host
> 3) ceph-objectstore-tool --pgid 1.98 --op mark-complete --data-path
> /var/lib/ceph/osd/ceph-9 --journal-path
> /var/lib/ceph/osd/ceph-9/journal
> 4) start ceph
> 5) look finally it reconstructing
>
> Il giorno mer 29 giu 2016 alle ore 11:11 Oliver Dzombic
> <info@xxxxxxxxxxxxxxxxx <mailto:info@xxxxxxxxxxxxxxxxx>> ha scritto:
>
> Hi,
>
> removing ONE disk while your replication is 2, is no problem.
>
> You dont need to wait a single second to replace of remove it. Its
> anyway not used and out/down. So from ceph's point of view its
> not existent.
>
> ----------------
>
> But as christian told you already, what we see now fits to a
> szenario
> where you lost the osd and eighter you did something, or
> something else
> happens, but the data were not recovered again.
>
> Eighter because another OSD was broken, or because you did
> something.
>
> Maybe, because of the "too many PGs per OSD (307 > max 300)"
> ceph never
> recovered.
>
> What i can see from http://pastebin.com/VZD7j2vN is that
>
> OSD 5,13,9,0,6,2,3 and maybe others, are the OSD's holding the
> incomplete data.
>
> This are 7 OSD's from 10. So something happend to that OSD's or
> the data
> in them. And that had nothing to do with a single disk failing.
>
> Something else must have been happend.
>
> And as christian already wrote: you will have to go through your
> logs
> back until the point were things going down.
>
> Because a fail of a single OSD, no matter what your replication
> size is,
> can ( normally ) not harm the consistency of 7 other OSD's,
> means 70% of
> your total cluster.
>
> --
> Mit freundlichen Gruessen / Best regards
>
> Oliver Dzombic
> IP-Interactive
>
> mailto:info@xxxxxxxxxxxxxxxxx <mailto:info@xxxxxxxxxxxxxxxxx>
>
> Anschrift:
>
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
>
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
>
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
>
>
> Am 29.06.2016 um 10:56 schrieb Mario Giammarco:
> > Yes I have removed it from crush because it was broken. I have
> waited 24
> > hours to see if cephs would like to heals itself. Then I
> removed the
> > disk completely (it was broken...) and I waited 24 hours
> again. Then I
> > start getting worried.
> > Are you saying to me that I should not remove a broken disk from
> > cluster? 24 hours were not enough?
> >
> > Il giorno mer 29 giu 2016 alle ore 10:53 Zoltan Arnold Nagy
> > <zoltan@xxxxxxxxxxxxxxxxxx <mailto:zoltan@xxxxxxxxxxxxxxxxxx>
> <mailto:zoltan@xxxxxxxxxxxxxxxxxx
> <mailto:zoltan@xxxxxxxxxxxxxxxxxx>>> ha scritto:
> >
> > Just loosing one disk doesn’t automagically delete it from
> CRUSH,
> > but in the output you had 10 disks listed, so there must be
> > something else going - did you delete the disk from the
> crush map as
> > well?
> >
> > Ceph waits by default 300 secs AFAIK to mark an OSD out
> after it
> > will start to recover.
> >
> >
> >> On 29 Jun 2016, at 10:42, Mario Giammarco
> <mgiammarco@xxxxxxxxx <mailto:mgiammarco@xxxxxxxxx>
> >> <mailto:mgiammarco@xxxxxxxxx
> <mailto:mgiammarco@xxxxxxxxx>>> wrote:
> >>
> >> I thank you for your reply so I can add my experience:
> >>
> >> 1) the other time this thing happened to me I had a
> cluster with
> >> min_size=2 and size=3 and the problem was the same. That
> time I
> >> put min_size=1 to recover the pool but it did not help.
> So I do
> >> not understand where is the advantage to put three copies
> when
> >> ceph can decide to discard all three.
> >> 2) I started with 11 hdds. The hard disk failed. Ceph waited
> >> forever for hard disk coming back. But hard disk is really
> >> completelly broken so I have followed the procedure to really
> >> delete from cluster. Anyway ceph did not recover.
> >> 3) I have 307 pgs more than 300 but it is due to the fact
> that I
> >> had 11 hdds now only 10. I will add more hdds after I
> repair the pool
> >> 4) I have reduced the monitors to 3
> >>
> >>
> >>
> >> Il giorno mer 29 giu 2016 alle ore 10:25 Christian Balzer
> >> <chibi@xxxxxxx <mailto:chibi@xxxxxxx>
> <mailto:chibi@xxxxxxx <mailto:chibi@xxxxxxx>>> ha scritto:
> >>
> >>
> >> Hello,
> >>
> >> On Wed, 29 Jun 2016 06:02:59 +0000 Mario Giammarco wrote:
> >>
> >> > pool 0 'rbd' replicated size 2 min_size 1
> crush_ruleset 0
> >> object_hash
> >> ^
> >> And that's the root cause of all your woes.
> >> The default replication size is 3 for a reason and
> while I do
> >> run pools
> >> with replication of 2 they are either HDD RAIDs or
> extremely
> >> trustworthy
> >> and well monitored SSD.
> >>
> >> That said, something more than a single HDD failure
> must have
> >> happened
> >> here, you should check the logs and backtrace all the
> step you
> >> did after
> >> that OSD failed.
> >>
> >> You said there were 11 HDDs and your first ceph -s
> output showed:
> >> ---
> >> osdmap e10182: 10 osds: 10 up, 10 in
> >> ----
> >> And your crush map states the same.
> >>
> >> So how and WHEN did you remove that OSD?
> >> My suspicion would be it was removed before recovery
> was complete.
> >>
> >> Also, as I think was mentioned before, 7 mons are
> overkill 3-5
> >> would be a
> >> saner number.
> >>
> >> Christian
> >>
> >> > rjenkins pg_num 512 pgp_num 512 last_change 9313 flags
> >> hashpspool
> >> > stripe_width 0
> >> > removed_snaps [1~3]
> >> > pool 1 'rbd2' replicated size 2 min_size 1
> crush_ruleset 0
> >> object_hash
> >> > rjenkins pg_num 512 pgp_num 512 last_change 9314 flags
> >> hashpspool
> >> > stripe_width 0
> >> > removed_snaps [1~3]
> >> > pool 2 'rbd3' replicated size 2 min_size 1
> crush_ruleset 0
> >> object_hash
> >> > rjenkins pg_num 512 pgp_num 512 last_change 10537 flags
> >> hashpspool
> >> > stripe_width 0
> >> > removed_snaps [1~3]
> >> >
> >> >
> >> > ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR
> >> > 5 1.81000 1.00000 1857G 984G 872G 53.00 0.86
> >> > 6 1.81000 1.00000 1857G 1202G 655G 64.73 1.05
> >> > 2 1.81000 1.00000 1857G 1158G 698G 62.38 1.01
> >> > 3 1.35999 1.00000 1391G 906G 485G 65.12 1.06
> >> > 4 0.89999 1.00000 926G 702G 223G 75.88 1.23
> >> > 7 1.81000 1.00000 1857G 1063G 793G 57.27 0.93
> >> > 8 1.81000 1.00000 1857G 1011G 846G 54.44 0.88
> >> > 9 0.89999 1.00000 926G 573G 352G 61.91 1.01
> >> > 0 1.81000 1.00000 1857G 1227G 629G 66.10 1.07
> >> > 13 0.45000 1.00000 460G 307G 153G 66.74 1.08
> >> > TOTAL 14846G 9136G 5710G 61.54
> >> > MIN/MAX VAR: 0.86/1.23 STDDEV: 6.47
> >> >
> >> >
> >> >
> >> > ceph version 0.94.7
> (d56bdf93ced6b80b07397d57e3fa68fe68304432)
> >> >
> >> > http://pastebin.com/SvGfcSHb
> >> > http://pastebin.com/gYFatsNS
> >> > http://pastebin.com/VZD7j2vN
> >> >
> >> > I do not understand why I/O on ENTIRE cluster is
> blocked
> >> when only few
> >> > pgs are incomplete.
> >> >
> >> > Many thanks,
> >> > Mario
> >> >
> >> >
> >> > Il giorno mar 28 giu 2016 alle ore 19:34 Stefan
> Priebe -
> >> Profihost AG <
> >> > s.priebe@xxxxxxxxxxxx
> <mailto:s.priebe@xxxxxxxxxxxx> <mailto:s.priebe@xxxxxxxxxxxx
> <mailto:s.priebe@xxxxxxxxxxxx>>> ha
> >> scritto:
> >> >
> >> > > And ceph health detail
> >> > >
> >> > > Stefan
> >> > >
> >> > > Excuse my typo sent from my mobile phone.
> >> > >
> >> > > Am 28.06.2016 um 19:28 schrieb Oliver Dzombic
> >> <info@xxxxxxxxxxxxxxxxx
> <mailto:info@xxxxxxxxxxxxxxxxx> <mailto:info@xxxxxxxxxxxxxxxxx
> <mailto:info@xxxxxxxxxxxxxxxxx>>>:
> >> > >
> >> > > Hi Mario,
> >> > >
> >> > > please give some more details:
> >> > >
> >> > > Please the output of:
> >> > >
> >> > > ceph osd pool ls detail
> >> > > ceph osd df
> >> > > ceph --version
> >> > >
> >> > > ceph -w for 10 seconds ( use http://pastebin.com/
> please )
> >> > >
> >> > > ceph osd crush dump ( also pastebin pls )
> >> > >
> >> > > --
> >> > > Mit freundlichen Gruessen / Best regards
> >> > >
> >> > > Oliver Dzombic
> >> > > IP-Interactive
> >> > >
> >> > > mailto:info@xxxxxxxxxxxxxxxxx
> <mailto:info@xxxxxxxxxxxxxxxxx>
> >> <mailto:info@xxxxxxxxxxxxxxxxx
> <mailto:info@xxxxxxxxxxxxxxxxx>> <info@xxxxxxxxxxxxxxxxx
> <mailto:info@xxxxxxxxxxxxxxxxx>
> >> <mailto:info@xxxxxxxxxxxxxxxxx
> <mailto:info@xxxxxxxxxxxxxxxxx>>>
> >> > >
> >> > > Anschrift:
> >> > >
> >> > > IP Interactive UG ( haftungsbeschraenkt )
> >> > > Zum Sonnenberg 1-3
> >> > > 63571 Gelnhausen
> >> > >
> >> > > HRB 93402 beim Amtsgericht Hanau
> >> > > Geschäftsführung: Oliver Dzombic
> >> > >
> >> > > Steuer Nr.: 35 236 3622 1
> >> > > UST ID: DE274086107
> >> > >
> >> > >
> >> > > Am 28.06.2016 um 18:59 schrieb Mario Giammarco:
> >> > >
> >> > > Hello,
> >> > >
> >> > > this is the second time that happens to me, I
> hope that
> >> someone can
> >> > >
> >> > > explain what I can do.
> >> > >
> >> > > Proxmox ceph cluster with 8 servers, 11 hdd.
> Min_size=1,
> >> size=2.
> >> > >
> >> > >
> >> > > One hdd goes down due to bad sectors.
> >> > >
> >> > > Ceph recovers but it ends with:
> >> > >
> >> > >
> >> > > cluster f2a8dd7d-949a-4a29-acab-11d4900249f4
> >> > >
> >> > > health HEALTH_WARN
> >> > >
> >> > > 3 pgs down
> >> > >
> >> > > 19 pgs incomplete
> >> > >
> >> > > 19 pgs stuck inactive
> >> > >
> >> > > 19 pgs stuck unclean
> >> > >
> >> > > 7 requests are blocked > 32 sec
> >> > >
> >> > > monmap e11: 7 mons at
> >> > >
> >> > > {0=192.168.0.204:6789/0,1=192.168.0.201:6789/0
> <http://192.168.0.204:6789/0,1=192.168.0.201:6789/0>
> >> <http://192.168.0.204:6789/0,1=192.168.0.201:6789/0>,
> >> > >
> >> > >
> >>
> 2=192.168.0.203:6789/0,3=192.168.0.205:6789/0,4=192.168.0.202
> <http://192.168.0.203:6789/0,3=192.168.0.205:6789/0,4=192.168.0.202>
> >>
> <http://192.168.0.203:6789/0,3=192.168.0.205:6789/0,4=192.168.0.202>:
> >> > >
> >> > >
> 6789/0,5=192.168.0.206:6789/0,6=192.168.0.207:6789/0
> <http://192.168.0.206:6789/0,6=192.168.0.207:6789/0>
> >> <http://192.168.0.206:6789/0,6=192.168.0.207:6789/0>}
> >> > >
> >> > > election epoch 722, quorum
> >> > >
> >> > > 0,1,2,3,4,5,6 1,4,2,0,3,5,6
> >> > >
> >> > > osdmap e10182: 10 osds: 10 up, 10 in
> >> > >
> >> > > pgmap v3295880: 1024 pgs, 2 pools, 4563 GB
> data, 1143
> >> kobjects
> >> > >
> >> > > 9136 GB used, 5710 GB / 14846 GB avail
> >> > >
> >> > > 1005 active+clean
> >> > >
> >> > > 16 incomplete
> >> > >
> >> > > 3 down+incomplete
> >> > >
> >> > >
> >> > > Unfortunately "7 requests blocked" means no virtual
> >> machine can boot
> >> > >
> >> > > because ceph has stopped i/o.
> >> > >
> >> > >
> >> > > I can accept to lose some data, but not ALL data!
> >> > >
> >> > > Can you help me please?
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Mario
> >> > >
> >> > >
> >> > > _______________________________________________
> >> > >
> >> > > ceph-users mailing list
> >> > >
> >> > > ceph-users@xxxxxxxxxxxxxx
> <mailto:ceph-users@xxxxxxxxxxxxxx>
> <mailto:ceph-users@xxxxxxxxxxxxxx
> <mailto:ceph-users@xxxxxxxxxxxxxx>>
> >> > >
> >> > >
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> > >
> >> > >
> >> > > _______________________________________________
> >> > > ceph-users mailing list
> >> > > ceph-users@xxxxxxxxxxxxxx
> <mailto:ceph-users@xxxxxxxxxxxxxx>
> <mailto:ceph-users@xxxxxxxxxxxxxx
> <mailto:ceph-users@xxxxxxxxxxxxxx>>
> >> > >
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> > >
> >> > > _______________________________________________
> >> > > ceph-users mailing list
> >> > > ceph-users@xxxxxxxxxxxxxx
> <mailto:ceph-users@xxxxxxxxxxxxxx>
> <mailto:ceph-users@xxxxxxxxxxxxxx
> <mailto:ceph-users@xxxxxxxxxxxxxx>>
> >> > >
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> > >
> >>
> >>
> >> --
> >> Christian Balzer Network/Systems Engineer
> >> chibi@xxxxxxx <mailto:chibi@xxxxxxx>
> <mailto:chibi@xxxxxxx <mailto:chibi@xxxxxxx>> Global
> OnLine
> >> Japan/Rakuten Communications
> >> http://www.gol.com/
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> <mailto:ceph-users@xxxxxxxxxxxxxx>
> <mailto:ceph-users@xxxxxxxxxxxxxx
> <mailto:ceph-users@xxxxxxxxxxxxxx>>
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com