Re: Running Ceph issues: HEALTH_WARN, unknown auth protocol, others

Wyatt Gorman <wyattgorman@xxxxxxxxxxxxxxx> · Thu, 2 May 2013 08:11:38 -0400

Sorry, I forgot to hit reply all.

That did it, I'm getting a "HEALTH_OK"!! Now I can move on with the process! Thanks guys, hopefully you won't see me back here too much ;)

On Wed, May 1, 2013 at 5:43 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:

[ Please keep all discussions on the list. :) ]

Okay, so you've now got just 128 that are sad. Those are all in pool

2, which I believe is "rbd" — you'll need to set your replication

level to 1 on all pools and that should fix it. :)

Keep in mind that with 1x replication you've only got 1 copy of

everything though, so if you lose one disk you're going to lose data.

You really want to get enough disks to set 2x replication.

-Greg

Software Engineer #42 @ http://inktank.com | http://ceph.com

On Wed, May 1, 2013 at 2:34 PM, Wyatt Gorman

<wyattgorman@xxxxxxxxxxxxxxx> wrote:

> ceph -s

>    health HEALTH_WARN 128 pgs degraded; 128 pgs stuck unclean

>    monmap e1: 1 mons at {a=10.81.2.100:6789/0}, election epoch 1, quorum 0 a

>    osdmap e40: 1 osds: 1 up, 1 in

>     pgmap v759: 384 pgs: 256 active+clean, 128 active+degraded; 8699 bytes

> data, 3430 MB used, 47828 MB / 54002 MB avail

>    mdsmap e41: 1/1/1 up {0=a=up:active}

>

>

> http://pastebin.com/0d7UM5s4

>

> Thanks for your help, Greg.

>

>

> On Wed, May 1, 2013 at 4:41 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:

>>

>> On Wed, May 1, 2013 at 1:32 PM, Dino Yancey <dino2gnt@xxxxxxxxx> wrote:

>> > Hi Wyatt,

>> >

>> > This is almost certainly a configuration issue.  If i recall, there is a

>> > min_size setting in the CRUSH rules for each pool that defaults to two

>> > which

>> > you may also need to reduce to one.  I don't have the documentation in

>> > front

>> > of me, so that's just off the top of my head...

>>

>> Hmm, no. The min_size should be set automatically to 1/2 of the

>> specified size (rounded up), which would be 1 in this case.

>> What's the full output of ceph -s? Can you pastebin the output of

>> "ceph pg dump" please?

>> -Greg

>> Software Engineer #42 @ http://inktank.com | http://ceph.com

>>

>> >

>> > Dino

>> >

>> >

>> > On Wed, May 1, 2013 at 3:19 PM, Wyatt Gorman

>> > <wyattgorman@xxxxxxxxxxxxxxx>

>> > wrote:

>> >>

>> >> Okay! Dino, thanks for your response. I reduced my metadata pool size

>> >> and

>> >> data pool size to 1, which eliminated the "recovery 21/42 degraded

>> >> (50.000%)" at the end of my HEALTH_WARN error. So now, when I run "ceph

>> >> health" I get the following:

>> >>

>> >> HEALTH_WARN 384 pgs degraded; 384 pgs stale; 384 pgs stuck unclean

>> >>

>> >> So this seems to be from one single root cause. Any ideas? Again, is

>> >> this

>> >> a corrupted drive issue that I can clean up, or is this still a ceph

>> >> configuration error?

>> >>

>> >>

>> >> On Wed, May 1, 2013 at 12:52 PM, Dino Yancey <dino2gnt@xxxxxxxxx>

>> >> wrote:

>> >>>

>> >>> Hi Wyatt,

>> >>>

>> >>> You need to reduce the replication level on your existing pools to 1,

>> >>> or

>> >>> bring up another OSD.  The default configuration specifies a

>> >>> replication

>> >>> level of 2, and the default crush rules want to place a replica on two

>> >>> distinct OSDs.  With one OSD, CRUSH can't determine placement for the

>> >>> replica and so Ceph is reporting a degraded state.

>> >>>

>> >>> Dino

>> >>>

>> >>>

>> >>> On Wed, May 1, 2013 at 11:45 AM, Wyatt Gorman

>> >>> <wyattgorman@xxxxxxxxxxxxxxx> wrote:

>> >>>>

>> >>>> Well, those points solved the issue of the redefined host and the

>> >>>> unidentified protocol. The

>> >>>>

>> >>>>

>> >>>> "HEALTH_WARN 384 pgs degraded; 384 pgs stuck unclean; recovery 21/42

>> >>>> degraded (50.000%)"

>> >>>>

>> >>>> error is still an issue, though. Is this something simple like some

>> >>>> hard

>> >>>> drive corruption that I can clean up with a fsck, or is this a ceph

>> >>>> issue?

>> >>>>

>> >>>>

>> >>>>

>> >>>> On Wed, May 1, 2013 at 12:31 PM, Mike Dawson

>> >>>> <mike.dawson@xxxxxxxxxxxxxxxx> wrote:

>> >>>>>

>> >>>>> Wyatt,

>> >>>>>

>> >>>>> A few notes:

>> >>>>>

>> >>>>> - Yes, the second "host = ceph" under mon.a is redundant and should

>> >>>>> be

>> >>>>> deleted.

>> >>>>>

>> >>>>> - "auth client required = cephx [osd]" should be simply

>> >>>>> auth client required = cephx".

>> >>>>>

>> >>>>> - Looks like you only have one OSD. You need at least as many (and

>> >>>>> hopefully more) OSDs than highest replication level out of your

>> >>>>> pools.

>> >>>>>

>> >>>>> Mike

>> >>>>>

>> >>>>>

>> >>>>> On 5/1/2013 12:23 PM, Wyatt Gorman wrote:

>> >>>>>>

>> >>>>>> Here is my ceph.conf. I just figured out that the second host =

>> >>>>>> isn't

>> >>>>>> necessary, though it is like that on the 5-minute quick start

>> >>>>>> guide...

>> >>>>>> (Perhaps I'll submit my couple of fixes that I've had to implement

>> >>>>>> so

>> >>>>>> far). That fixes the "redefined host" issue, but none of the

>> >>>>>> others.

>> >>>>>>

>> >>>>>> [global]

>> >>>>>>      # For version 0.55 and beyond, you must explicitly enable or

>> >>>>>>      # disable authentication with "auth" entries in [global].

>> >>>>>>

>> >>>>>>      auth cluster required = cephx

>> >>>>>>      auth service required = cephx

>> >>>>>>      auth client required = cephx [osd]

>> >>>>>>      osd journal size = 1000

>> >>>>>>

>> >>>>>>      #The following assumes ext4 filesystem.

>> >>>>>>      filestore xattr use omap = true

>> >>>>>>      # For Bobtail (v 0.56) and subsequent versions, you may add

>> >>>>>>      #settings for mkcephfs so that it will create and mount the

>> >>>>>> file

>> >>>>>>      #system on a particular OSD for you. Remove the comment `#`

>> >>>>>>      #character for the following settings and replace the values

>> >>>>>> in

>> >>>>>>      #braces with appropriate values, or leave the following

>> >>>>>> settings

>> >>>>>>      #commented out to accept the default values. You must specify

>> >>>>>>      #the --mkfs option with mkcephfs in order for the deployment

>> >>>>>>      #script to utilize the following settings, and you must define

>> >>>>>>      #the 'devs' option for each osd instance; see below. osd mkfs

>> >>>>>>      #type = {fs-type} osd mkfs options {fs-type} = {mkfs options}

>> >>>>>> #

>> >>>>>>      #default for xfs is "-f" osd mount options {fs-type} = {mount

>> >>>>>>      #options} # default mount option is "rw,noatime"

>> >>>>>>      # For example, for ext4, the mount option might look like

>> >>>>>> this:

>> >>>>>>

>> >>>>>>      #osd mkfs options ext4 = user_xattr,rw,noatime

>> >>>>>>      # Execute $ hostname to retrieve the name of your host, and

>> >>>>>>      # replace {hostname} with the name of your host. For the

>> >>>>>>      # monitor, replace {ip-address} with the IP address of your

>> >>>>>>      # host.

>> >>>>>> [mon.a]

>> >>>>>>      host = ceph

>> >>>>>>      mon addr = 10.81.2.100:6789 <http://10.81.2.100:6789> [osd.0]

>> >>>>>>

>> >>>>>>      host = ceph

>> >>>>>>

>> >>>>>>      # For Bobtail (v 0.56) and subsequent versions, you may add

>> >>>>>>      # settings for mkcephfs so that it will create and mount the

>> >>>>>>      # file system on a particular OSD for you. Remove the comment

>> >>>>>>      # `#` character for the following setting for each OSD and

>> >>>>>>      # specify a path to the device if you use mkcephfs with the

>> >>>>>>      # --mkfs option.

>> >>>>>>

>> >>>>>>      #devs = {path-to-device}

>> >>>>>> [osd.1]

>> >>>>>>      host = ceph

>> >>>>>>      #devs = {path-to-device}

>> >>>>>> [mds.a]

>> >>>>>>      host = ceph

>> >>>>>>

>> >>>>>>

>> >>>>>> On Wed, May 1, 2013 at 12:14 PM, Mike Dawson

>> >>>>>> <mike.dawson@xxxxxxxxxxxxxxxx

>> >>>>>> <mailto:mike.dawson@xxxxxxxxxxxxxxxx>>

>> >>>>>> wrote:

>> >>>>>>

>> >>>>>>     Wyatt,

>> >>>>>>

>> >>>>>>     Please post your ceph.conf.

>> >>>>>>

>> >>>>>>     - mike

>> >>>>>>

>> >>>>>>

>> >>>>>>     On 5/1/2013 12:06 PM, Wyatt Gorman wrote:

>> >>>>>>

>> >>>>>>         Hi everyone,

>> >>>>>>

>> >>>>>>         I'm setting up a test ceph cluster and am having trouble

>> >>>>>> getting it

>> >>>>>>         running (great for testing, huh?). I went through the

>> >>>>>>         installation on

>> >>>>>>         Debian squeeze, had to modify the mkcephfs script a bit

>> >>>>>> because

>> >>>>>>         it calls

>> >>>>>>         monmaptool with too many paramaters in the $args variable

>> >>>>>> (mine had

>> >>>>>>         "--add a [ip address]:[port] [osd1]" and I had to get rid

>> >>>>>> of

>> >>>>>> the

>> >>>>>>         [osd1]

>> >>>>>>         part for the monmaptool command to take it). Anyway, so I

>> >>>>>> got

>> >>>>>> it

>> >>>>>>         installed, started the service, waiting a little while for

>> >>>>>> it

>> >>>>>> to

>> >>>>>>         build

>> >>>>>>         the fs, and ran "ceph health" and got (and am still getting

>> >>>>>>         after a day

>> >>>>>>         and a reboot) the following error: (note: I have also been

>> >>>>>>         getting the

>> >>>>>>         first line in various calls, unsure why it is complaining,

>> >>>>>> I

>> >>>>>>         followed

>> >>>>>>         the instructions...)

>> >>>>>>

>> >>>>>>         warning: line 34: 'host' in section 'mon.a' redefined

>> >>>>>>         2013-05-01 12:04:39.801102 b733b710 -1 WARNING: unknown

>> >>>>>> auth

>> >>>>>>         protocol

>> >>>>>>         defined: [osd]

>> >>>>>>         HEALTH_WARN 384 pgs degraded; 384 pgs stuck unclean;

>> >>>>>> recovery

>> >>>>>> 21/42

>> >>>>>>         degraded (50.000%)

>> >>>>>>

>> >>>>>>         Can anybody tell me the root of this issue, and how I can

>> >>>>>> fix

>> >>>>>>         it? Thank you!

>> >>>>>>

>> >>>>>>         - Wyatt Gorman

>> >>>>>>

>> >>>>>>

>> >>>>>>         _________________________________________________

>> >>>>>>         ceph-users mailing list

>> >>>>>>         ceph-users@xxxxxxxxxxxxxx

>> >>>>>> <mailto:ceph-users@xxxxxxxxxxxxxx>

>> >>>>>>         http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com

>> >>>>>>         <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

>> >>>>>>

>> >>>>>>

>> >>>>

>> >>>>

>> >>>> _______________________________________________

>> >>>> ceph-users mailing list

>> >>>> ceph-users@xxxxxxxxxxxxxx

>> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>> >>>>

>> >>>

>> >>>

>> >>>

>> >>> --

>> >>> ______________________________

>> >>> Dino Yancey

>> >>> 2GNT.com Admin

>> >>

>> >>

>> >

>> >

>> >

>> > --

>> > ______________________________

>> > Dino Yancey

>> > 2GNT.com Admin

>> >

>> > _______________________________________________

>> > ceph-users mailing list

>> > ceph-users@xxxxxxxxxxxxxx

>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>> >

>

>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com