Re: Help: pool not responding

Mario Giammarco <mgiammarco@xxxxxxxxx> · Fri, 4 Mar 2016 09:10:39 +0100

I have restarted each host using init scripts. Is there another way?

2016-03-03 21:51 GMT+01:00 Dimitar Boichev <Dimitar.Boichev@xxxxxxxxxxxxx>:

But the whole cluster or what ?

Regards.

Dimitar Boichev
SysAdmin Team Lead
AXSMarine Sofia
Phone: +359 889 22 55 42
Skype: dimitar.boichev.axsmarine
E-mail: dimitar.boichev@xxxxxxxxxxxxx

On Mar 3, 2016, at 22:47, Mario Giammarco <mgiammarco@xxxxxxxxx> wrote:

Uses init script to restart

Da: Dimitar Boichev
Inviato: giovedì 3 marzo 2016 21:44
A: Mario Giammarco
Cc: Oliver Dzombic; 
ceph-users@xxxxxxxxxxxxxx
Oggetto: Re:  Help: pool not responding

I see a lot of people (including myself) ending with PGs that are stuck in “creating” state when you force create them.

How did you restart ceph ?
Mine were created fine after I restarted the monitor nodes after a minor version upgrade.
Did you do it monitors firs, osds second, etc etc …..

Regards.

On Mar 3, 2016, at 13:13, Mario Giammarco <mgiammarco@xxxxxxxxx> wrote:

I have tried "force create". It says "creating" but at the end problem persists.

I have restarted ceph as usual.

I am evaluating ceph and I am shocked because it semeed a very robust filesystem and now for a glitch I have an entire pool blocked and there is no simple procedure to force a recovery.

2016-03-02 18:31 GMT+01:00 Oliver Dzombic 
<info@xxxxxxxxxxxxxxxxx>:

Hi,

i could also not find any delete, but a create.

I found this here, its basically your situation:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-July/032412.html

--

Mit freundlichen Gruessen / Best regards

Oliver Dzombic

IP-Interactive

mailto:info@xxxxxxxxxxxxxxxxx

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )

Zum Sonnenberg 1-3

63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau

Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1

UST ID: DE274086107

Am 02.03.2016 um 18:28 schrieb Mario Giammarco:

> Thans for info even if it is a bad info.

> Anyway I am reading docs again and I do not see a way to delete PGs.

> How can I remove them?

> Thanks,

> Mario

>

> 2016-03-02 17:59 GMT+01:00 Oliver Dzombic <info@xxxxxxxxxxxxxxxxx

> <mailto:info@xxxxxxxxxxxxxxxxx>>:

>

>     Hi,

>

>     as i see your situation, somehow this 4 pg's got lost.

>

>     They will not recover, because they are incomplete. So there is no data

>     from which it could be recovered.

>

>     So all what is left is to delete this pg's.

>

>     Since all 3 osd's are in and up, it does not seem like you can somehow

>     access this lost pg's.

>

>     --

>     Mit freundlichen Gruessen / Best regards

>

>     Oliver Dzombic

>     IP-Interactive

>

>     mailto:info@xxxxxxxxxxxxxxxxx <mailto:info@xxxxxxxxxxxxxxxxx>

>

>     Anschrift:

>

>     IP Interactive UG ( haftungsbeschraenkt )

>     Zum Sonnenberg 1-3

>     63571 Gelnhausen

>

>     HRB 93402 beim Amtsgericht Hanau

>     Geschäftsführung: Oliver Dzombic

>

>     Steuer Nr.: 
35 236 3622 1 <tel:35%20236%203622%201>

>     UST ID: DE274086107

>

>

>     Am 02.03.2016 <tel:02.03.2016> um 17:45 schrieb Mario Giammarco:

>     >

>     >

>     > Here it is:

>     >

>     >  cluster ac7bc476-3a02-453d-8e5c-606ab6f022ca

>     >      health HEALTH_WARN

>     >             4 pgs incomplete

>     >             4 pgs stuck inactive

>     >             4 pgs stuck unclean

>     >             1 requests are blocked > 32 sec

>     >      monmap e8: 3 mons at

>     > {0=10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0

>     <http://10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0>

>     > <http://10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0>}

>     >             election epoch 840, quorum 0,1,2 0,1,2

>     >      osdmap e2405: 3 osds: 3 up, 3 in

>     >       pgmap v5904430: 288 pgs, 4 pools, 391 GB data, 100 kobjects

>     >             1090 GB used, 4481 GB / 5571 GB avail

>     >                  284 active+clean

>     >                    4 incomplete

>     >   client io 4008 B/s rd, 446 kB/s wr, 23 op/s

>     >

>     >

>     > 2016-03-02 9:31 GMT+01:00 Shinobu Kinjo <skinjo@xxxxxxxxxx

>     <mailto:skinjo@xxxxxxxxxx>

>     > <mailto:skinjo@xxxxxxxxxx <mailto:skinjo@xxxxxxxxxx>>>:

>     >

>     >     Is "ceph -s" still showing you same output?

>     >

>     >     >     cluster ac7bc476-3a02-453d-8e5c-606ab6f022ca

>     >     >      health HEALTH_WARN

>     >     >             4 pgs incomplete

>     >     >             4 pgs stuck inactive

>     >     >             4 pgs stuck unclean

>     >     >      monmap e8: 3 mons at

>     >     > {0=10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0

>     <http://10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0>

>     >     <http://10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0>}

>     >     >             election epoch 832, quorum 0,1,2 0,1,2

>     >     >      osdmap e2400: 3 osds: 3 up, 3 in

>     >     >       pgmap v5883297: 288 pgs, 4 pools, 391 GB data, 100

>     kobjects

>     >     >             1090 GB used, 4481 GB / 5571 GB avail

>     >     >                  284 active+clean

>     >     >                    4 incomplete

>     >

>     >     Cheers,

>     >     S

>     >

>     >     ----- Original Message -----

>     >     From: "Mario Giammarco" <mgiammarco@xxxxxxxxx

>     <mailto:mgiammarco@xxxxxxxxx>

>     >     <mailto:mgiammarco@xxxxxxxxx <mailto:mgiammarco@xxxxxxxxx>>>

>     >     To: "Lionel Bouton" <lionel-subscription@xxxxxxxxxxx

>     <mailto:lionel-subscription@xxxxxxxxxxx>

>     >     <mailto:lionel-subscription@xxxxxxxxxxx

>     <mailto:lionel-subscription@xxxxxxxxxxx>>>

>     >     Cc: "Shinobu Kinjo" <skinjo@xxxxxxxxxx

>     <mailto:skinjo@xxxxxxxxxx> <mailto:skinjo@xxxxxxxxxx

>     <mailto:skinjo@xxxxxxxxxx>>>,

>     >     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>

>     <mailto:ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>>

>     >     Sent: Wednesday, March 2, 2016 4:27:15 PM

>     >     Subject: Re:  Help: pool not responding

>     >

>     >     Tried to set min_size=1 but unfortunately nothing has changed.

>     >     Thanks for the idea.

>     >

>     >     2016-02-29 22:56 GMT+01:00 Lionel Bouton

>     >     <lionel-subscription@xxxxxxxxxxx

>     <mailto:lionel-subscription@xxxxxxxxxxx>

>     >     <mailto:lionel-subscription@xxxxxxxxxxx

>     <mailto:lionel-subscription@xxxxxxxxxxx>>>:

>     >

>     >     > Le 29/02/2016 22:50, Shinobu Kinjo a écrit :

>     >     >

>     >     > the fact that they are optimized for benchmarks and

>     certainly not

>     >     > Ceph OSD usage patterns (with or without internal journal).

>     >     >

>     >     > Are you assuming that SSHD is causing the issue?

>     >     > If you could elaborate on this more, it would be helpful.

>     >     >

>     >     >

>     >     > Probably not (unless they reveal themselves extremely unreliable

>     >     with Ceph

>     >     > OSD usage patterns which would be surprising to me).

>     >     >

>     >     > For incomplete PG the documentation seems good enough for what

>     >     should be

>     >     > done :

>     >     > 
http://docs.ceph.com/docs/master/rados/operations/pg-states/

>     >     >

>     >     > The relevant text:

>     >     >

>     >     > *Incomplete* Ceph detects that a placement group is missing

>     >     information

>     >     > about writes that may have occurred, or does not have any

>     healthy

>     >     copies.

>     >     > If you see this state, try to start any failed OSDs that may

>     >     contain the

>     >     > needed information or temporarily adjust min_size to allow

>     recovery.

>     >     >

>     >     > We don't have the full history but the most probable cause

>     of these

>     >     > incomplete PGs is that min_size is set to 2 or 3 and at some

>     time

>     >     the 4

>     >     > incomplete pgs didn't have as many replica as the min_size

>     value.

>     >     So if

>     >     > setting min_size to 2 isn't enough setting it to 1 should

>     unfreeze

>     >     them.

>     >     >

>     >     > Lionel

>     >     >

>     >

>     >

>     >

>     >

>     >

>     > _______________________________________________

>     > ceph-users mailing list

>     > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>

>     > 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>     >

>     _______________________________________________

>     ceph-users mailing list

>     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>

>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

>

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com