Sorry, I forgot to hit reply all.
That did it, I'm getting a "HEALTH_OK"!! Now I can move on with the process! Thanks guys, hopefully you won't see me back here too much ;)
That did it, I'm getting a "HEALTH_OK"!! Now I can move on with the process! Thanks guys, hopefully you won't see me back here too much ;)
On Wed, May 1, 2013 at 5:43 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
[ Please keep all discussions on the list. :) ]
Okay, so you've now got just 128 that are sad. Those are all in pool
2, which I believe is "rbd" — you'll need to set your replication
level to 1 on all pools and that should fix it. :)
Keep in mind that with 1x replication you've only got 1 copy of
everything though, so if you lose one disk you're going to lose data.
You really want to get enough disks to set 2x replication.
On Wed, May 1, 2013 at 2:34 PM, Wyatt Gorman
<wyattgorman@xxxxxxxxxxxxxxx> wrote:
> ceph -s
> health HEALTH_WARN 128 pgs degraded; 128 pgs stuck unclean
> monmap e1: 1 mons at {a=10.81.2.100:6789/0}, election epoch 1, quorum 0 a
> osdmap e40: 1 osds: 1 up, 1 in
> pgmap v759: 384 pgs: 256 active+clean, 128 active+degraded; 8699 bytes
> data, 3430 MB used, 47828 MB / 54002 MB avail
> mdsmap e41: 1/1/1 up {0=a=up:active}
>
>
> http://pastebin.com/0d7UM5s4
>
> Thanks for your help, Greg.
>
>
> On Wed, May 1, 2013 at 4:41 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>
>> On Wed, May 1, 2013 at 1:32 PM, Dino Yancey <dino2gnt@xxxxxxxxx> wrote:
>> > Hi Wyatt,
>> >
>> > This is almost certainly a configuration issue. If i recall, there is a
>> > min_size setting in the CRUSH rules for each pool that defaults to two
>> > which
>> > you may also need to reduce to one. I don't have the documentation in
>> > front
>> > of me, so that's just off the top of my head...
>>
>> Hmm, no. The min_size should be set automatically to 1/2 of the
>> specified size (rounded up), which would be 1 in this case.
>> What's the full output of ceph -s? Can you pastebin the output of
>> "ceph pg dump" please?
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>> >
>> > Dino
>> >
>> >
>> > On Wed, May 1, 2013 at 3:19 PM, Wyatt Gorman
>> > <wyattgorman@xxxxxxxxxxxxxxx>
>> > wrote:
>> >>
>> >> Okay! Dino, thanks for your response. I reduced my metadata pool size
>> >> and
>> >> data pool size to 1, which eliminated the "recovery 21/42 degraded
>> >> (50.000%)" at the end of my HEALTH_WARN error. So now, when I run "ceph
>> >> health" I get the following:
>> >>
>> >> HEALTH_WARN 384 pgs degraded; 384 pgs stale; 384 pgs stuck unclean
>> >>
>> >> So this seems to be from one single root cause. Any ideas? Again, is
>> >> this
>> >> a corrupted drive issue that I can clean up, or is this still a ceph
>> >> configuration error?
>> >>
>> >>
>> >> On Wed, May 1, 2013 at 12:52 PM, Dino Yancey <dino2gnt@xxxxxxxxx>
>> >> wrote:
>> >>>
>> >>> Hi Wyatt,
>> >>>
>> >>> You need to reduce the replication level on your existing pools to 1,
>> >>> or
>> >>> bring up another OSD. The default configuration specifies a
>> >>> replication
>> >>> level of 2, and the default crush rules want to place a replica on two
>> >>> distinct OSDs. With one OSD, CRUSH can't determine placement for the
>> >>> replica and so Ceph is reporting a degraded state.
>> >>>
>> >>> Dino
>> >>>
>> >>>
>> >>> On Wed, May 1, 2013 at 11:45 AM, Wyatt Gorman
>> >>> <wyattgorman@xxxxxxxxxxxxxxx> wrote:
>> >>>>
>> >>>> Well, those points solved the issue of the redefined host and the
>> >>>> unidentified protocol. The
>> >>>>
>> >>>>
>> >>>> "HEALTH_WARN 384 pgs degraded; 384 pgs stuck unclean; recovery 21/42
>> >>>> degraded (50.000%)"
>> >>>>
>> >>>> error is still an issue, though. Is this something simple like some
>> >>>> hard
>> >>>> drive corruption that I can clean up with a fsck, or is this a ceph
>> >>>> issue?
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Wed, May 1, 2013 at 12:31 PM, Mike Dawson
>> >>>> <mike.dawson@xxxxxxxxxxxxxxxx> wrote:
>> >>>>>
>> >>>>> Wyatt,
>> >>>>>
>> >>>>> A few notes:
>> >>>>>
>> >>>>> - Yes, the second "host = ceph" under mon.a is redundant and should
>> >>>>> be
>> >>>>> deleted.
>> >>>>>
>> >>>>> - "auth client required = cephx [osd]" should be simply
>> >>>>> auth client required = cephx".
>> >>>>>
>> >>>>> - Looks like you only have one OSD. You need at least as many (and
>> >>>>> hopefully more) OSDs than highest replication level out of your
>> >>>>> pools.
>> >>>>>
>> >>>>> Mike
>> >>>>>
>> >>>>>
>> >>>>> On 5/1/2013 12:23 PM, Wyatt Gorman wrote:
>> >>>>>>
>> >>>>>> Here is my ceph.conf. I just figured out that the second host =
>> >>>>>> isn't
>> >>>>>> necessary, though it is like that on the 5-minute quick start
>> >>>>>> guide...
>> >>>>>> (Perhaps I'll submit my couple of fixes that I've had to implement
>> >>>>>> so
>> >>>>>> far). That fixes the "redefined host" issue, but none of the
>> >>>>>> others.
>> >>>>>>
>> >>>>>> [global]
>> >>>>>> # For version 0.55 and beyond, you must explicitly enable or
>> >>>>>> # disable authentication with "auth" entries in [global].
>> >>>>>>
>> >>>>>> auth cluster required = cephx
>> >>>>>> auth service required = cephx
>> >>>>>> auth client required = cephx [osd]
>> >>>>>> osd journal size = 1000
>> >>>>>>
>> >>>>>> #The following assumes ext4 filesystem.
>> >>>>>> filestore xattr use omap = true
>> >>>>>> # For Bobtail (v 0.56) and subsequent versions, you may add
>> >>>>>> #settings for mkcephfs so that it will create and mount the
>> >>>>>> file
>> >>>>>> #system on a particular OSD for you. Remove the comment `#`
>> >>>>>> #character for the following settings and replace the values
>> >>>>>> in
>> >>>>>> #braces with appropriate values, or leave the following
>> >>>>>> settings
>> >>>>>> #commented out to accept the default values. You must specify
>> >>>>>> #the --mkfs option with mkcephfs in order for the deployment
>> >>>>>> #script to utilize the following settings, and you must define
>> >>>>>> #the 'devs' option for each osd instance; see below. osd mkfs
>> >>>>>> #type = {fs-type} osd mkfs options {fs-type} = {mkfs options}
>> >>>>>> #
>> >>>>>> #default for xfs is "-f" osd mount options {fs-type} = {mount
>> >>>>>> #options} # default mount option is "rw,noatime"
>> >>>>>> # For example, for ext4, the mount option might look like
>> >>>>>> this:
>> >>>>>>
>> >>>>>> #osd mkfs options ext4 = user_xattr,rw,noatime
>> >>>>>> # Execute $ hostname to retrieve the name of your host, and
>> >>>>>> # replace {hostname} with the name of your host. For the
>> >>>>>> # monitor, replace {ip-address} with the IP address of your
>> >>>>>> # host.
>> >>>>>> [mon.a]
>> >>>>>> host = ceph
>> >>>>>> mon addr = 10.81.2.100:6789 <http://10.81.2.100:6789> [osd.0]
>> >>>>>>
>> >>>>>> host = ceph
>> >>>>>>
>> >>>>>> # For Bobtail (v 0.56) and subsequent versions, you may add
>> >>>>>> # settings for mkcephfs so that it will create and mount the
>> >>>>>> # file system on a particular OSD for you. Remove the comment
>> >>>>>> # `#` character for the following setting for each OSD and
>> >>>>>> # specify a path to the device if you use mkcephfs with the
>> >>>>>> # --mkfs option.
>> >>>>>>
>> >>>>>> #devs = {path-to-device}
>> >>>>>> [osd.1]
>> >>>>>> host = ceph
>> >>>>>> #devs = {path-to-device}
>> >>>>>> [mds.a]
>> >>>>>> host = ceph
>> >>>>>>
>> >>>>>>
>> >>>>>> On Wed, May 1, 2013 at 12:14 PM, Mike Dawson
>> >>>>>> <mike.dawson@xxxxxxxxxxxxxxxx
>> >>>>>> <mailto:mike.dawson@xxxxxxxxxxxxxxxx>>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>> Wyatt,
>> >>>>>>
>> >>>>>> Please post your ceph.conf.
>> >>>>>>
>> >>>>>> - mike
>> >>>>>>
>> >>>>>>
>> >>>>>> On 5/1/2013 12:06 PM, Wyatt Gorman wrote:
>> >>>>>>
>> >>>>>> Hi everyone,
>> >>>>>>
>> >>>>>> I'm setting up a test ceph cluster and am having trouble
>> >>>>>> getting it
>> >>>>>> running (great for testing, huh?). I went through the
>> >>>>>> installation on
>> >>>>>> Debian squeeze, had to modify the mkcephfs script a bit
>> >>>>>> because
>> >>>>>> it calls
>> >>>>>> monmaptool with too many paramaters in the $args variable
>> >>>>>> (mine had
>> >>>>>> "--add a [ip address]:[port] [osd1]" and I had to get rid
>> >>>>>> of
>> >>>>>> the
>> >>>>>> [osd1]
>> >>>>>> part for the monmaptool command to take it). Anyway, so I
>> >>>>>> got
>> >>>>>> it
>> >>>>>> installed, started the service, waiting a little while for
>> >>>>>> it
>> >>>>>> to
>> >>>>>> build
>> >>>>>> the fs, and ran "ceph health" and got (and am still getting
>> >>>>>> after a day
>> >>>>>> and a reboot) the following error: (note: I have also been
>> >>>>>> getting the
>> >>>>>> first line in various calls, unsure why it is complaining,
>> >>>>>> I
>> >>>>>> followed
>> >>>>>> the instructions...)
>> >>>>>>
>> >>>>>> warning: line 34: 'host' in section 'mon.a' redefined
>> >>>>>> 2013-05-01 12:04:39.801102 b733b710 -1 WARNING: unknown
>> >>>>>> auth
>> >>>>>> protocol
>> >>>>>> defined: [osd]
>> >>>>>> HEALTH_WARN 384 pgs degraded; 384 pgs stuck unclean;
>> >>>>>> recovery
>> >>>>>> 21/42
>> >>>>>> degraded (50.000%)
>> >>>>>>
>> >>>>>> Can anybody tell me the root of this issue, and how I can
>> >>>>>> fix
>> >>>>>> it? Thank you!
>> >>>>>>
>> >>>>>> - Wyatt Gorman
>> >>>>>>
>> >>>>>>
>> >>>>>> _________________________________________________
>> >>>>>> ceph-users mailing list
>> >>>>>> ceph-users@xxxxxxxxxxxxxx
>> >>>>>> <mailto:ceph-users@xxxxxxxxxxxxxx>
>> >>>>>> http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
>> >>>>>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>> >>>>>>
>> >>>>>>
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> ceph-users mailing list
>> >>>> ceph-users@xxxxxxxxxxxxxx
>> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> ______________________________
>> >>> Dino Yancey
>> >>> 2GNT.com Admin
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > ______________________________
>> > Dino Yancey
>> > 2GNT.com Admin
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>
>
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com