Re: Help: pool not responding

Mario Giammarco <mgiammarco@xxxxxxxxx> · Sat, 5 Mar 2016 12:19:27 +0100

Tried in all ways to recover pool (putting also osd out, scrub, etc.)If there is no way to reset that four pgs or to understand why they are not repariring themselves I will destroy the pool.
But destroying an entire pool only to unblock 4 pgs that are incomplete is incredible.

Mario

2016-03-03 21:51 GMT+01:00 Dimitar Boichev <Dimitar.Boichev@xxxxxxxxxxxxx>:

But the whole cluster or what ?

Regards.

Dimitar Boichev
SysAdmin Team Lead
AXSMarine Sofia
Phone: +359 889 22 55 42
Skype: dimitar.boichev.axsmarine
E-mail: dimitar.boichev@xxxxxxxxxxxxx

On Mar 3, 2016, at 22:47, Mario Giammarco <mgiammarco@xxxxxxxxx> wrote:

Uses init script to restart

Da: Dimitar Boichev
Inviato: giovedì 3 marzo 2016 21:44
A: Mario Giammarco
Cc: Oliver Dzombic; 
ceph-users@xxxxxxxxxxxxxx
Oggetto: Re:  Help: pool not responding

I see a lot of people (including myself) ending with PGs that are stuck in “creating” state when you force create them.

How did you restart ceph ?
Mine were created fine after I restarted the monitor nodes after a minor version upgrade.
Did you do it monitors firs, osds second, etc etc …..

Regards.

On Mar 3, 2016, at 13:13, Mario Giammarco <mgiammarco@xxxxxxxxx> wrote:

I have tried "force create". It says "creating" but at the end problem persists.

I have restarted ceph as usual.

I am evaluating ceph and I am shocked because it semeed a very robust filesystem and now for a glitch I have an entire pool blocked and there is no simple procedure to force a recovery.

2016-03-02 18:31 GMT+01:00 Oliver Dzombic 
<info@xxxxxxxxxxxxxxxxx>:

Hi,

i could also not find any delete, but a create.

I found this here, its basically your situation:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-July/032412.html

--

Mit freundlichen Gruessen / Best regards

Oliver Dzombic

IP-Interactive

mailto:info@xxxxxxxxxxxxxxxxx

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )

Zum Sonnenberg 1-3

63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau

Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1

UST ID: DE274086107

Am 02.03.2016 um 18:28 schrieb Mario Giammarco:

> Thans for info even if it is a bad info.

> Anyway I am reading docs again and I do not see a way to delete PGs.

> How can I remove them?

> Thanks,

> Mario

>

> 2016-03-02 17:59 GMT+01:00 Oliver Dzombic <info@xxxxxxxxxxxxxxxxx

> <mailto:info@xxxxxxxxxxxxxxxxx>>:

>

>     Hi,

>

>     as i see your situation, somehow this 4 pg's got lost.

>

>     They will not recover, because they are incomplete. So there is no data

>     from which it could be recovered.

>

>     So all what is left is to delete this pg's.

>

>     Since all 3 osd's are in and up, it does not seem like you can somehow

>     access this lost pg's.

>

>     --

>     Mit freundlichen Gruessen / Best regards

>

>     Oliver Dzombic

>     IP-Interactive

>

>     mailto:info@xxxxxxxxxxxxxxxxx <mailto:info@xxxxxxxxxxxxxxxxx>

>

>     Anschrift:

>

>     IP Interactive UG ( haftungsbeschraenkt )

>     Zum Sonnenberg 1-3

>     63571 Gelnhausen

>

>     HRB 93402 beim Amtsgericht Hanau

>     Geschäftsführung: Oliver Dzombic

>

>     Steuer Nr.: 
35 236 3622 1 <tel:35%20236%203622%201>

>     UST ID: DE274086107

>

>

>     Am 02.03.2016 <tel:02.03.2016> um 17:45 schrieb Mario Giammarco:

>     >

>     >

>     > Here it is:

>     >

>     >  cluster ac7bc476-3a02-453d-8e5c-606ab6f022ca

>     >      health HEALTH_WARN

>     >             4 pgs incomplete

>     >             4 pgs stuck inactive

>     >             4 pgs stuck unclean

>     >             1 requests are blocked > 32 sec

>     >      monmap e8: 3 mons at

>     > {0=10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0

>     <http://10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0>

>     > <http://10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0>}

>     >             election epoch 840, quorum 0,1,2 0,1,2

>     >      osdmap e2405: 3 osds: 3 up, 3 in

>     >       pgmap v5904430: 288 pgs, 4 pools, 391 GB data, 100 kobjects

>     >             1090 GB used, 4481 GB / 5571 GB avail

>     >                  284 active+clean

>     >                    4 incomplete

>     >   client io 4008 B/s rd, 446 kB/s wr, 23 op/s

>     >

>     >

>     > 2016-03-02 9:31 GMT+01:00 Shinobu Kinjo <skinjo@xxxxxxxxxx

>     <mailto:skinjo@xxxxxxxxxx>

>     > <mailto:skinjo@xxxxxxxxxx <mailto:skinjo@xxxxxxxxxx>>>:

>     >

>     >     Is "ceph -s" still showing you same output?

>     >

>     >     >     cluster ac7bc476-3a02-453d-8e5c-606ab6f022ca

>     >     >      health HEALTH_WARN

>     >     >             4 pgs incomplete

>     >     >             4 pgs stuck inactive

>     >     >             4 pgs stuck unclean

>     >     >      monmap e8: 3 mons at

>     >     > {0=10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0

>     <http://10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0>

>     >     <http://10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0>}

>     >     >             election epoch 832, quorum 0,1,2 0,1,2

>     >     >      osdmap e2400: 3 osds: 3 up, 3 in

>     >     >       pgmap v5883297: 288 pgs, 4 pools, 391 GB data, 100

>     kobjects

>     >     >             1090 GB used, 4481 GB / 5571 GB avail

>     >     >                  284 active+clean

>     >     >                    4 incomplete

>     >

>     >     Cheers,

>     >     S

>     >

>     >     ----- Original Message -----

>     >     From: "Mario Giammarco" <mgiammarco@xxxxxxxxx

>     <mailto:mgiammarco@xxxxxxxxx>

>     >     <mailto:mgiammarco@xxxxxxxxx <mailto:mgiammarco@xxxxxxxxx>>>

>     >     To: "Lionel Bouton" <lionel-subscription@xxxxxxxxxxx

>     <mailto:lionel-subscription@xxxxxxxxxxx>

>     >     <mailto:lionel-subscription@xxxxxxxxxxx

>     <mailto:lionel-subscription@xxxxxxxxxxx>>>

>     >     Cc: "Shinobu Kinjo" <skinjo@xxxxxxxxxx

>     <mailto:skinjo@xxxxxxxxxx> <mailto:skinjo@xxxxxxxxxx

>     <mailto:skinjo@xxxxxxxxxx>>>,

>     >     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>

>     <mailto:ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>>

>     >     Sent: Wednesday, March 2, 2016 4:27:15 PM

>     >     Subject: Re:  Help: pool not responding

>     >

>     >     Tried to set min_size=1 but unfortunately nothing has changed.

>     >     Thanks for the idea.

>     >

>     >     2016-02-29 22:56 GMT+01:00 Lionel Bouton

>     >     <lionel-subscription@xxxxxxxxxxx

>     <mailto:lionel-subscription@xxxxxxxxxxx>

>     >     <mailto:lionel-subscription@xxxxxxxxxxx

>     <mailto:lionel-subscription@xxxxxxxxxxx>>>:

>     >

>     >     > Le 29/02/2016 22:50, Shinobu Kinjo a écrit :

>     >     >

>     >     > the fact that they are optimized for benchmarks and

>     certainly not

>     >     > Ceph OSD usage patterns (with or without internal journal).

>     >     >

>     >     > Are you assuming that SSHD is causing the issue?

>     >     > If you could elaborate on this more, it would be helpful.

>     >     >

>     >     >

>     >     > Probably not (unless they reveal themselves extremely unreliable

>     >     with Ceph

>     >     > OSD usage patterns which would be surprising to me).

>     >     >

>     >     > For incomplete PG the documentation seems good enough for what

>     >     should be

>     >     > done :

>     >     > 
http://docs.ceph.com/docs/master/rados/operations/pg-states/

>     >     >

>     >     > The relevant text:

>     >     >

>     >     > *Incomplete* Ceph detects that a placement group is missing

>     >     information

>     >     > about writes that may have occurred, or does not have any

>     healthy

>     >     copies.

>     >     > If you see this state, try to start any failed OSDs that may

>     >     contain the

>     >     > needed information or temporarily adjust min_size to allow

>     recovery.

>     >     >

>     >     > We don't have the full history but the most probable cause

>     of these

>     >     > incomplete PGs is that min_size is set to 2 or 3 and at some

>     time

>     >     the 4

>     >     > incomplete pgs didn't have as many replica as the min_size

>     value.

>     >     So if

>     >     > setting min_size to 2 isn't enough setting it to 1 should

>     unfreeze

>     >     them.

>     >     >

>     >     > Lionel

>     >     >

>     >

>     >

>     >

>     >

>     >

>     > _______________________________________________

>     > ceph-users mailing list

>     > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>

>     > 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>     >

>     _______________________________________________

>     ceph-users mailing list

>     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>

>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

>

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com