Re: Running Ceph issues: HEALTH_WARN, unknown auth protocol, others

Gregory Farnum <greg@xxxxxxxxxxx> · Wed, 1 May 2013 14:43:53 -0700

[ Please keep all discussions on the list. :) ]

Okay, so you've now got just 128 that are sad. Those are all in pool
2, which I believe is "rbd" — you'll need to set your replication
level to 1 on all pools and that should fix it. :)
Keep in mind that with 1x replication you've only got 1 copy of
everything though, so if you lose one disk you're going to lose data.
You really want to get enough disks to set 2x replication.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

On Wed, May 1, 2013 at 2:34 PM, Wyatt Gorman
<wyattgorman@xxxxxxxxxxxxxxx> wrote:
> ceph -s
>    health HEALTH_WARN 128 pgs degraded; 128 pgs stuck unclean
>    monmap e1: 1 mons at {a=10.81.2.100:6789/0}, election epoch 1, quorum 0 a
>    osdmap e40: 1 osds: 1 up, 1 in
>     pgmap v759: 384 pgs: 256 active+clean, 128 active+degraded; 8699 bytes
> data, 3430 MB used, 47828 MB / 54002 MB avail
>    mdsmap e41: 1/1/1 up {0=a=up:active}
>
>
> http://pastebin.com/0d7UM5s4
>
> Thanks for your help, Greg.
>
>
> On Wed, May 1, 2013 at 4:41 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>
>> On Wed, May 1, 2013 at 1:32 PM, Dino Yancey <dino2gnt@xxxxxxxxx> wrote:
>> > Hi Wyatt,
>> >
>> > This is almost certainly a configuration issue.  If i recall, there is a
>> > min_size setting in the CRUSH rules for each pool that defaults to two
>> > which
>> > you may also need to reduce to one.  I don't have the documentation in
>> > front
>> > of me, so that's just off the top of my head...
>>
>> Hmm, no. The min_size should be set automatically to 1/2 of the
>> specified size (rounded up), which would be 1 in this case.
>> What's the full output of ceph -s? Can you pastebin the output of
>> "ceph pg dump" please?
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>> >
>> > Dino
>> >
>> >
>> > On Wed, May 1, 2013 at 3:19 PM, Wyatt Gorman
>> > <wyattgorman@xxxxxxxxxxxxxxx>
>> > wrote:
>> >>
>> >> Okay! Dino, thanks for your response. I reduced my metadata pool size
>> >> and
>> >> data pool size to 1, which eliminated the "recovery 21/42 degraded
>> >> (50.000%)" at the end of my HEALTH_WARN error. So now, when I run "ceph
>> >> health" I get the following:
>> >>
>> >> HEALTH_WARN 384 pgs degraded; 384 pgs stale; 384 pgs stuck unclean
>> >>
>> >> So this seems to be from one single root cause. Any ideas? Again, is
>> >> this
>> >> a corrupted drive issue that I can clean up, or is this still a ceph
>> >> configuration error?
>> >>
>> >>
>> >> On Wed, May 1, 2013 at 12:52 PM, Dino Yancey <dino2gnt@xxxxxxxxx>
>> >> wrote:
>> >>>
>> >>> Hi Wyatt,
>> >>>
>> >>> You need to reduce the replication level on your existing pools to 1,
>> >>> or
>> >>> bring up another OSD.  The default configuration specifies a
>> >>> replication
>> >>> level of 2, and the default crush rules want to place a replica on two
>> >>> distinct OSDs.  With one OSD, CRUSH can't determine placement for the
>> >>> replica and so Ceph is reporting a degraded state.
>> >>>
>> >>> Dino
>> >>>
>> >>>
>> >>> On Wed, May 1, 2013 at 11:45 AM, Wyatt Gorman
>> >>> <wyattgorman@xxxxxxxxxxxxxxx> wrote:
>> >>>>
>> >>>> Well, those points solved the issue of the redefined host and the
>> >>>> unidentified protocol. The
>> >>>>
>> >>>>
>> >>>> "HEALTH_WARN 384 pgs degraded; 384 pgs stuck unclean; recovery 21/42
>> >>>> degraded (50.000%)"
>> >>>>
>> >>>> error is still an issue, though. Is this something simple like some
>> >>>> hard
>> >>>> drive corruption that I can clean up with a fsck, or is this a ceph
>> >>>> issue?
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Wed, May 1, 2013 at 12:31 PM, Mike Dawson
>> >>>> <mike.dawson@xxxxxxxxxxxxxxxx> wrote:
>> >>>>>
>> >>>>> Wyatt,
>> >>>>>
>> >>>>> A few notes:
>> >>>>>
>> >>>>> - Yes, the second "host = ceph" under mon.a is redundant and should
>> >>>>> be
>> >>>>> deleted.
>> >>>>>
>> >>>>> - "auth client required = cephx [osd]" should be simply
>> >>>>> auth client required = cephx".
>> >>>>>
>> >>>>> - Looks like you only have one OSD. You need at least as many (and
>> >>>>> hopefully more) OSDs than highest replication level out of your
>> >>>>> pools.
>> >>>>>
>> >>>>> Mike
>> >>>>>
>> >>>>>
>> >>>>> On 5/1/2013 12:23 PM, Wyatt Gorman wrote:
>> >>>>>>
>> >>>>>> Here is my ceph.conf. I just figured out that the second host =
>> >>>>>> isn't
>> >>>>>> necessary, though it is like that on the 5-minute quick start
>> >>>>>> guide...
>> >>>>>> (Perhaps I'll submit my couple of fixes that I've had to implement
>> >>>>>> so
>> >>>>>> far). That fixes the "redefined host" issue, but none of the
>> >>>>>> others.
>> >>>>>>
>> >>>>>> [global]
>> >>>>>>      # For version 0.55 and beyond, you must explicitly enable or
>> >>>>>>      # disable authentication with "auth" entries in [global].
>> >>>>>>
>> >>>>>>      auth cluster required = cephx
>> >>>>>>      auth service required = cephx
>> >>>>>>      auth client required = cephx [osd]
>> >>>>>>      osd journal size = 1000
>> >>>>>>
>> >>>>>>      #The following assumes ext4 filesystem.
>> >>>>>>      filestore xattr use omap = true
>> >>>>>>      # For Bobtail (v 0.56) and subsequent versions, you may add
>> >>>>>>      #settings for mkcephfs so that it will create and mount the
>> >>>>>> file
>> >>>>>>      #system on a particular OSD for you. Remove the comment `#`
>> >>>>>>      #character for the following settings and replace the values
>> >>>>>> in
>> >>>>>>      #braces with appropriate values, or leave the following
>> >>>>>> settings
>> >>>>>>      #commented out to accept the default values. You must specify
>> >>>>>>      #the --mkfs option with mkcephfs in order for the deployment
>> >>>>>>      #script to utilize the following settings, and you must define
>> >>>>>>      #the 'devs' option for each osd instance; see below. osd mkfs
>> >>>>>>      #type = {fs-type} osd mkfs options {fs-type} = {mkfs options}
>> >>>>>> #
>> >>>>>>      #default for xfs is "-f" osd mount options {fs-type} = {mount
>> >>>>>>      #options} # default mount option is "rw,noatime"
>> >>>>>>      # For example, for ext4, the mount option might look like
>> >>>>>> this:
>> >>>>>>
>> >>>>>>      #osd mkfs options ext4 = user_xattr,rw,noatime
>> >>>>>>      # Execute $ hostname to retrieve the name of your host, and
>> >>>>>>      # replace {hostname} with the name of your host. For the
>> >>>>>>      # monitor, replace {ip-address} with the IP address of your
>> >>>>>>      # host.
>> >>>>>> [mon.a]
>> >>>>>>      host = ceph
>> >>>>>>      mon addr = 10.81.2.100:6789 <http://10.81.2.100:6789> [osd.0]
>> >>>>>>
>> >>>>>>      host = ceph
>> >>>>>>
>> >>>>>>      # For Bobtail (v 0.56) and subsequent versions, you may add
>> >>>>>>      # settings for mkcephfs so that it will create and mount the
>> >>>>>>      # file system on a particular OSD for you. Remove the comment
>> >>>>>>      # `#` character for the following setting for each OSD and
>> >>>>>>      # specify a path to the device if you use mkcephfs with the
>> >>>>>>      # --mkfs option.
>> >>>>>>
>> >>>>>>      #devs = {path-to-device}
>> >>>>>> [osd.1]
>> >>>>>>      host = ceph
>> >>>>>>      #devs = {path-to-device}
>> >>>>>> [mds.a]
>> >>>>>>      host = ceph
>> >>>>>>
>> >>>>>>
>> >>>>>> On Wed, May 1, 2013 at 12:14 PM, Mike Dawson
>> >>>>>> <mike.dawson@xxxxxxxxxxxxxxxx
>> >>>>>> <mailto:mike.dawson@xxxxxxxxxxxxxxxx>>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>     Wyatt,
>> >>>>>>
>> >>>>>>     Please post your ceph.conf.
>> >>>>>>
>> >>>>>>     - mike
>> >>>>>>
>> >>>>>>
>> >>>>>>     On 5/1/2013 12:06 PM, Wyatt Gorman wrote:
>> >>>>>>
>> >>>>>>         Hi everyone,
>> >>>>>>
>> >>>>>>         I'm setting up a test ceph cluster and am having trouble
>> >>>>>> getting it
>> >>>>>>         running (great for testing, huh?). I went through the
>> >>>>>>         installation on
>> >>>>>>         Debian squeeze, had to modify the mkcephfs script a bit
>> >>>>>> because
>> >>>>>>         it calls
>> >>>>>>         monmaptool with too many paramaters in the $args variable
>> >>>>>> (mine had
>> >>>>>>         "--add a [ip address]:[port] [osd1]" and I had to get rid
>> >>>>>> of
>> >>>>>> the
>> >>>>>>         [osd1]
>> >>>>>>         part for the monmaptool command to take it). Anyway, so I
>> >>>>>> got
>> >>>>>> it
>> >>>>>>         installed, started the service, waiting a little while for
>> >>>>>> it
>> >>>>>> to
>> >>>>>>         build
>> >>>>>>         the fs, and ran "ceph health" and got (and am still getting
>> >>>>>>         after a day
>> >>>>>>         and a reboot) the following error: (note: I have also been
>> >>>>>>         getting the
>> >>>>>>         first line in various calls, unsure why it is complaining,
>> >>>>>> I
>> >>>>>>         followed
>> >>>>>>         the instructions...)
>> >>>>>>
>> >>>>>>         warning: line 34: 'host' in section 'mon.a' redefined
>> >>>>>>         2013-05-01 12:04:39.801102 b733b710 -1 WARNING: unknown
>> >>>>>> auth
>> >>>>>>         protocol
>> >>>>>>         defined: [osd]
>> >>>>>>         HEALTH_WARN 384 pgs degraded; 384 pgs stuck unclean;
>> >>>>>> recovery
>> >>>>>> 21/42
>> >>>>>>         degraded (50.000%)
>> >>>>>>
>> >>>>>>         Can anybody tell me the root of this issue, and how I can
>> >>>>>> fix
>> >>>>>>         it? Thank you!
>> >>>>>>
>> >>>>>>         - Wyatt Gorman
>> >>>>>>
>> >>>>>>
>> >>>>>>         _________________________________________________
>> >>>>>>         ceph-users mailing list
>> >>>>>>         ceph-users@xxxxxxxxxxxxxx
>> >>>>>> <mailto:ceph-users@xxxxxxxxxxxxxx>
>> >>>>>>         http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
>> >>>>>>         <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>> >>>>>>
>> >>>>>>
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> ceph-users mailing list
>> >>>> ceph-users@xxxxxxxxxxxxxx
>> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> ______________________________
>> >>> Dino Yancey
>> >>> 2GNT.com Admin
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > ______________________________
>> > Dino Yancey
>> > 2GNT.com Admin
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com