Just loosing one disk doesn’t automagically delete it from CRUSH, but in the output you had 10 disks listed, so there must be something else going - did you delete the disk from the crush map as well?Ceph waits by default 300 secs AFAIK to mark an OSD out after it will start to recover.
I thank you for your reply so I can add my experience:
1) the other time this thing happened to me I had a cluster with min_size=2 and size=3 and the problem was the same. That time I put min_size=1 to recover the pool but it did not help. So I do not understand where is the advantage to put three copies when ceph can decide to discard all three. 2) I started with 11 hdds. The hard disk failed. Ceph waited forever for hard disk coming back. But hard disk is really completelly broken so I have followed the procedure to really delete from cluster. Anyway ceph did not recover. 3) I have 307 pgs more than 300 but it is due to the fact that I had 11 hdds now only 10. I will add more hdds after I repair the pool 4) I have reduced the monitors to 3
Il giorno mer 29 giu 2016 alle ore 10:25 Christian Balzer < chibi@xxxxxxx> ha scritto:
Hello,
On Wed, 29 Jun 2016 06:02:59 +0000 Mario Giammarco wrote:
> pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
^
And that's the root cause of all your woes.
The default replication size is 3 for a reason and while I do run pools
with replication of 2 they are either HDD RAIDs or extremely trustworthy
and well monitored SSD.
That said, something more than a single HDD failure must have happened
here, you should check the logs and backtrace all the step you did after
that OSD failed.
You said there were 11 HDDs and your first ceph -s output showed:
---
osdmap e10182: 10 osds: 10 up, 10 in
----
And your crush map states the same.
So how and WHEN did you remove that OSD?
My suspicion would be it was removed before recovery was complete.
Also, as I think was mentioned before, 7 mons are overkill 3-5 would be a
saner number.
Christian
> rjenkins pg_num 512 pgp_num 512 last_change 9313 flags hashpspool
> stripe_width 0
> removed_snaps [1~3]
> pool 1 'rbd2' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 512 pgp_num 512 last_change 9314 flags hashpspool
> stripe_width 0
> removed_snaps [1~3]
> pool 2 'rbd3' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 512 pgp_num 512 last_change 10537 flags hashpspool
> stripe_width 0
> removed_snaps [1~3]
>
>
> ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR
> 5 1.81000 1.00000 1857G 984G 872G 53.00 0.86
> 6 1.81000 1.00000 1857G 1202G 655G 64.73 1.05
> 2 1.81000 1.00000 1857G 1158G 698G 62.38 1.01
> 3 1.35999 1.00000 1391G 906G 485G 65.12 1.06
> 4 0.89999 1.00000 926G 702G 223G 75.88 1.23
> 7 1.81000 1.00000 1857G 1063G 793G 57.27 0.93
> 8 1.81000 1.00000 1857G 1011G 846G 54.44 0.88
> 9 0.89999 1.00000 926G 573G 352G 61.91 1.01
> 0 1.81000 1.00000 1857G 1227G 629G 66.10 1.07
> 13 0.45000 1.00000 460G 307G 153G 66.74 1.08
> TOTAL 14846G 9136G 5710G 61.54
> MIN/MAX VAR: 0.86/1.23 STDDEV: 6.47
>
>
>
> ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
>
> http://pastebin.com/SvGfcSHb
> http://pastebin.com/gYFatsNS
> http://pastebin.com/VZD7j2vN
>
> I do not understand why I/O on ENTIRE cluster is blocked when only few
> pgs are incomplete.
>
> Many thanks,
> Mario
>
>
> Il giorno mar 28 giu 2016 alle ore 19:34 Stefan Priebe - Profihost AG <
> s.priebe@xxxxxxxxxxxx> ha scritto:
>
> > And ceph health detail
> >
> > Stefan
> >
> > Excuse my typo sent from my mobile phone.
> >
> > Am 28.06.2016 um 19:28 schrieb Oliver Dzombic <info@xxxxxxxxxxxxxxxxx>:
> >
> > Hi Mario,
> >
> > please give some more details:
> >
> > Please the output of:
> >
> > ceph osd pool ls detail
> > ceph osd df
> > ceph --version
> >
> > ceph -w for 10 seconds ( use http://pastebin.com/ please )
> >
> > ceph osd crush dump ( also pastebin pls )
> >
> > --
> > Mit freundlichen Gruessen / Best regards
> >
> > Oliver Dzombic
> > IP-Interactive
> >
> > mailto:info@xxxxxxxxxxxxxxxxx <info@xxxxxxxxxxxxxxxxx>
> >
> > Anschrift:
> >
> > IP Interactive UG ( haftungsbeschraenkt )
> > Zum Sonnenberg 1-3
> > 63571 Gelnhausen
> >
> > HRB 93402 beim Amtsgericht Hanau
> > Geschäftsführung: Oliver Dzombic
> >
> > Steuer Nr.: 35 236 3622 1
> > UST ID: DE274086107
> >
> >
> > Am 28.06.2016 um 18:59 schrieb Mario Giammarco:
> >
> > Hello,
> >
> > this is the second time that happens to me, I hope that someone can
> >
> > explain what I can do.
> >
> > Proxmox ceph cluster with 8 servers, 11 hdd. Min_size=1, size=2.
> >
> >
> > One hdd goes down due to bad sectors.
> >
> > Ceph recovers but it ends with:
> >
> >
> > cluster f2a8dd7d-949a-4a29-acab-11d4900249f4
> >
> > health HEALTH_WARN
> >
> > 3 pgs down
> >
> > 19 pgs incomplete
> >
> > 19 pgs stuck inactive
> >
> > 19 pgs stuck unclean
> >
> > 7 requests are blocked > 32 sec
> >
> > monmap e11: 7 mons at
> >
> > {0=192.168.0.204:6789/0,1=192.168.0.201:6789/0,
> >
> > 2=192.168.0.203:6789/0,3=192.168.0.205:6789/0,4=192.168.0.202:
> >
> > 6789/0,5=192.168.0.206:6789/0,6=192.168.0.207:6789/0}
> >
> > election epoch 722, quorum
> >
> > 0,1,2,3,4,5,6 1,4,2,0,3,5,6
> >
> > osdmap e10182: 10 osds: 10 up, 10 in
> >
> > pgmap v3295880: 1024 pgs, 2 pools, 4563 GB data, 1143 kobjects
> >
> > 9136 GB used, 5710 GB / 14846 GB avail
> >
> > 1005 active+clean
> >
> > 16 incomplete
> >
> > 3 down+incomplete
> >
> >
> > Unfortunately "7 requests blocked" means no virtual machine can boot
> >
> > because ceph has stopped i/o.
> >
> >
> > I can accept to lose some data, but not ALL data!
> >
> > Can you help me please?
> >
> > Thanks,
> >
> > Mario
> >
> >
> > _______________________________________________
> >
> > ceph-users mailing list
> >
> > ceph-users@xxxxxxxxxxxxxx
> >
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
--
Christian Balzer Network/Systems Engineer
chibi@xxxxxxx Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxxhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
|