Re: Another cluster completely hang

Zoltan Arnold Nagy <zoltan@xxxxxxxxxxxxxxxxxx> · Wed, 29 Jun 2016 10:53:35 +0200

Just loosing one disk doesn’t automagically delete it from CRUSH, but in the output you had 10 disks listed, so there must be something else going - did you delete the disk from the crush map as well?
Ceph waits by default 300 secs AFAIK to mark an OSD out after it will start to recover.

On 29 Jun 2016, at 10:42, Mario Giammarco <mgiammarco@xxxxxxxxx> wrote:

I thank you for your reply so I can add my experience:
1) the other time this thing happened to me I had a cluster with min_size=2 and size=3 and the problem was the same. That time I put min_size=1 to recover the pool but it did not help. So I do not understand where is the advantage to put three copies when ceph can decide to discard all three.
2) I started with 11 hdds. The hard disk failed. Ceph waited forever for hard disk coming back. But hard disk is really completelly broken so I have followed the procedure to really delete from cluster. Anyway ceph did not recover.
3) I have 307 pgs more than 300 but it is due to the fact that I had 11 hdds now only 10. I will add more hdds after I repair the pool
4) I have reduced the monitors to 3

Il giorno mer 29 giu 2016 alle ore 10:25 Christian Balzer <chibi@xxxxxxx> ha scritto:

Hello,

On Wed, 29 Jun 2016 06:02:59 +0000 Mario Giammarco wrote:

> pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash

                               ^

And that's the root cause of all your woes.

The default replication size is 3 for a reason and while I do run pools

with replication of 2 they are either HDD RAIDs or extremely trustworthy

and well monitored SSD.

That said, something more than a single HDD failure must have happened

here, you should check the logs and backtrace all the step you did after

that OSD failed.

You said there were 11 HDDs and your first ceph -s output showed:

---

     osdmap e10182: 10 osds: 10 up, 10 in

----

And your crush map states the same.

So how and WHEN did you remove that OSD?

My suspicion would be it was removed before recovery was complete.

Also, as I think was mentioned before, 7 mons are overkill 3-5 would be a

saner number.

Christian

> rjenkins pg_num 512 pgp_num 512 last_change 9313 flags hashpspool

> stripe_width 0

>        removed_snaps [1~3]

> pool 1 'rbd2' replicated size 2 min_size 1 crush_ruleset 0 object_hash

> rjenkins pg_num 512 pgp_num 512 last_change 9314 flags hashpspool

> stripe_width 0

>        removed_snaps [1~3]

> pool 2 'rbd3' replicated size 2 min_size 1 crush_ruleset 0 object_hash

> rjenkins pg_num 512 pgp_num 512 last_change 10537 flags hashpspool

> stripe_width 0

>        removed_snaps [1~3]

>

>

> ID WEIGHT  REWEIGHT SIZE   USE   AVAIL %USE  VAR

> 5 1.81000  1.00000  1857G  984G  872G 53.00 0.86

> 6 1.81000  1.00000  1857G 1202G  655G 64.73 1.05

> 2 1.81000  1.00000  1857G 1158G  698G 62.38 1.01

> 3 1.35999  1.00000  1391G  906G  485G 65.12 1.06

> 4 0.89999  1.00000   926G  702G  223G 75.88 1.23

> 7 1.81000  1.00000  1857G 1063G  793G 57.27 0.93

> 8 1.81000  1.00000  1857G 1011G  846G 54.44 0.88

> 9 0.89999  1.00000   926G  573G  352G 61.91 1.01

> 0 1.81000  1.00000  1857G 1227G  629G 66.10 1.07

> 13 0.45000  1.00000   460G  307G  153G 66.74 1.08

>              TOTAL 14846G 9136G 5710G 61.54

> MIN/MAX VAR: 0.86/1.23  STDDEV: 6.47

>

>

>

> ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)

>

> http://pastebin.com/SvGfcSHb

> http://pastebin.com/gYFatsNS

> http://pastebin.com/VZD7j2vN

>

> I do not understand why I/O on ENTIRE cluster is blocked when only few

> pgs are incomplete.

>

> Many thanks,

> Mario

>

>

> Il giorno mar 28 giu 2016 alle ore 19:34 Stefan Priebe - Profihost AG <

> s.priebe@xxxxxxxxxxxx> ha scritto:

>

> > And ceph health detail

> >

> > Stefan

> >

> > Excuse my typo sent from my mobile phone.

> >

> > Am 28.06.2016 um 19:28 schrieb Oliver Dzombic <info@xxxxxxxxxxxxxxxxx>:

> >

> > Hi Mario,

> >

> > please give some more details:

> >

> > Please the output of:

> >

> > ceph osd pool ls detail

> > ceph osd df

> > ceph --version

> >

> > ceph -w for 10 seconds ( use http://pastebin.com/ please )

> >

> > ceph osd crush dump ( also pastebin pls )

> >

> > --

> > Mit freundlichen Gruessen / Best regards

> >

> > Oliver Dzombic

> > IP-Interactive

> >

> > mailto:info@xxxxxxxxxxxxxxxxx <info@xxxxxxxxxxxxxxxxx>

> >

> > Anschrift:

> >

> > IP Interactive UG ( haftungsbeschraenkt )

> > Zum Sonnenberg 1-3

> > 63571 Gelnhausen

> >

> > HRB 93402 beim Amtsgericht Hanau

> > Geschäftsführung: Oliver Dzombic

> >

> > Steuer Nr.: 35 236 3622 1

> > UST ID: DE274086107

> >

> >

> > Am 28.06.2016 um 18:59 schrieb Mario Giammarco:

> >

> > Hello,

> >

> > this is the second time that happens to me, I hope that someone can

> >

> > explain what I can do.

> >

> > Proxmox ceph cluster with 8 servers, 11 hdd. Min_size=1, size=2.

> >

> >

> > One hdd goes down due to bad sectors.

> >

> > Ceph recovers but it ends with:

> >

> >

> > cluster f2a8dd7d-949a-4a29-acab-11d4900249f4

> >

> >     health HEALTH_WARN

> >

> >            3 pgs down

> >

> >            19 pgs incomplete

> >

> >            19 pgs stuck inactive

> >

> >            19 pgs stuck unclean

> >

> >            7 requests are blocked > 32 sec

> >

> >     monmap e11: 7 mons at

> >

> > {0=192.168.0.204:6789/0,1=192.168.0.201:6789/0,

> >

> > 2=192.168.0.203:6789/0,3=192.168.0.205:6789/0,4=192.168.0.202:

> >

> > 6789/0,5=192.168.0.206:6789/0,6=192.168.0.207:6789/0}

> >

> >            election epoch 722, quorum

> >

> > 0,1,2,3,4,5,6 1,4,2,0,3,5,6

> >

> >     osdmap e10182: 10 osds: 10 up, 10 in

> >

> >      pgmap v3295880: 1024 pgs, 2 pools, 4563 GB data, 1143 kobjects

> >

> >            9136 GB used, 5710 GB / 14846 GB avail

> >

> >                1005 active+clean

> >

> >                  16 incomplete

> >

> >                   3 down+incomplete

> >

> >

> > Unfortunately "7 requests blocked" means no virtual machine can boot

> >

> > because ceph has stopped i/o.

> >

> >

> > I can accept to lose some data, but not ALL data!

> >

> > Can you help me please?

> >

> > Thanks,

> >

> > Mario

> >

> >

> > _______________________________________________

> >

> > ceph-users mailing list

> >

> > ceph-users@xxxxxxxxxxxxxx

> >

> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

> >

> >

> > _______________________________________________

> > ceph-users mailing list

> > ceph-users@xxxxxxxxxxxxxx

> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

> >

> > _______________________________________________

> > ceph-users mailing list

> > ceph-users@xxxxxxxxxxxxxx

> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

> >

--

Christian Balzer        Network/Systems Engineer

chibi@xxxxxxx           Global OnLine Japan/Rakuten Communications

http://www.gol.com/

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com