On Wed, May 1, 2013 at 1:32 PM, Dino Yancey <dino2gnt@xxxxxxxxx> wrote: > Hi Wyatt, > > This is almost certainly a configuration issue. If i recall, there is a > min_size setting in the CRUSH rules for each pool that defaults to two which > you may also need to reduce to one. I don't have the documentation in front > of me, so that's just off the top of my head... Hmm, no. The min_size should be set automatically to 1/2 of the specified size (rounded up), which would be 1 in this case. What's the full output of ceph -s? Can you pastebin the output of "ceph pg dump" please? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com > > Dino > > > On Wed, May 1, 2013 at 3:19 PM, Wyatt Gorman <wyattgorman@xxxxxxxxxxxxxxx> > wrote: >> >> Okay! Dino, thanks for your response. I reduced my metadata pool size and >> data pool size to 1, which eliminated the "recovery 21/42 degraded >> (50.000%)" at the end of my HEALTH_WARN error. So now, when I run "ceph >> health" I get the following: >> >> HEALTH_WARN 384 pgs degraded; 384 pgs stale; 384 pgs stuck unclean >> >> So this seems to be from one single root cause. Any ideas? Again, is this >> a corrupted drive issue that I can clean up, or is this still a ceph >> configuration error? >> >> >> On Wed, May 1, 2013 at 12:52 PM, Dino Yancey <dino2gnt@xxxxxxxxx> wrote: >>> >>> Hi Wyatt, >>> >>> You need to reduce the replication level on your existing pools to 1, or >>> bring up another OSD. The default configuration specifies a replication >>> level of 2, and the default crush rules want to place a replica on two >>> distinct OSDs. With one OSD, CRUSH can't determine placement for the >>> replica and so Ceph is reporting a degraded state. >>> >>> Dino >>> >>> >>> On Wed, May 1, 2013 at 11:45 AM, Wyatt Gorman >>> <wyattgorman@xxxxxxxxxxxxxxx> wrote: >>>> >>>> Well, those points solved the issue of the redefined host and the >>>> unidentified protocol. The >>>> >>>> >>>> "HEALTH_WARN 384 pgs degraded; 384 pgs stuck unclean; recovery 21/42 >>>> degraded (50.000%)" >>>> >>>> error is still an issue, though. Is this something simple like some hard >>>> drive corruption that I can clean up with a fsck, or is this a ceph issue? >>>> >>>> >>>> >>>> On Wed, May 1, 2013 at 12:31 PM, Mike Dawson >>>> <mike.dawson@xxxxxxxxxxxxxxxx> wrote: >>>>> >>>>> Wyatt, >>>>> >>>>> A few notes: >>>>> >>>>> - Yes, the second "host = ceph" under mon.a is redundant and should be >>>>> deleted. >>>>> >>>>> - "auth client required = cephx [osd]" should be simply >>>>> auth client required = cephx". >>>>> >>>>> - Looks like you only have one OSD. You need at least as many (and >>>>> hopefully more) OSDs than highest replication level out of your pools. >>>>> >>>>> Mike >>>>> >>>>> >>>>> On 5/1/2013 12:23 PM, Wyatt Gorman wrote: >>>>>> >>>>>> Here is my ceph.conf. I just figured out that the second host = isn't >>>>>> necessary, though it is like that on the 5-minute quick start guide... >>>>>> (Perhaps I'll submit my couple of fixes that I've had to implement so >>>>>> far). That fixes the "redefined host" issue, but none of the others. >>>>>> >>>>>> [global] >>>>>> # For version 0.55 and beyond, you must explicitly enable or >>>>>> # disable authentication with "auth" entries in [global]. >>>>>> >>>>>> auth cluster required = cephx >>>>>> auth service required = cephx >>>>>> auth client required = cephx [osd] >>>>>> osd journal size = 1000 >>>>>> >>>>>> #The following assumes ext4 filesystem. >>>>>> filestore xattr use omap = true >>>>>> # For Bobtail (v 0.56) and subsequent versions, you may add >>>>>> #settings for mkcephfs so that it will create and mount the file >>>>>> #system on a particular OSD for you. Remove the comment `#` >>>>>> #character for the following settings and replace the values in >>>>>> #braces with appropriate values, or leave the following settings >>>>>> #commented out to accept the default values. You must specify >>>>>> #the --mkfs option with mkcephfs in order for the deployment >>>>>> #script to utilize the following settings, and you must define >>>>>> #the 'devs' option for each osd instance; see below. osd mkfs >>>>>> #type = {fs-type} osd mkfs options {fs-type} = {mkfs options} # >>>>>> #default for xfs is "-f" osd mount options {fs-type} = {mount >>>>>> #options} # default mount option is "rw,noatime" >>>>>> # For example, for ext4, the mount option might look like this: >>>>>> >>>>>> #osd mkfs options ext4 = user_xattr,rw,noatime >>>>>> # Execute $ hostname to retrieve the name of your host, and >>>>>> # replace {hostname} with the name of your host. For the >>>>>> # monitor, replace {ip-address} with the IP address of your >>>>>> # host. >>>>>> [mon.a] >>>>>> host = ceph >>>>>> mon addr = 10.81.2.100:6789 <http://10.81.2.100:6789> [osd.0] >>>>>> >>>>>> host = ceph >>>>>> >>>>>> # For Bobtail (v 0.56) and subsequent versions, you may add >>>>>> # settings for mkcephfs so that it will create and mount the >>>>>> # file system on a particular OSD for you. Remove the comment >>>>>> # `#` character for the following setting for each OSD and >>>>>> # specify a path to the device if you use mkcephfs with the >>>>>> # --mkfs option. >>>>>> >>>>>> #devs = {path-to-device} >>>>>> [osd.1] >>>>>> host = ceph >>>>>> #devs = {path-to-device} >>>>>> [mds.a] >>>>>> host = ceph >>>>>> >>>>>> >>>>>> On Wed, May 1, 2013 at 12:14 PM, Mike Dawson >>>>>> <mike.dawson@xxxxxxxxxxxxxxxx <mailto:mike.dawson@xxxxxxxxxxxxxxxx>> >>>>>> wrote: >>>>>> >>>>>> Wyatt, >>>>>> >>>>>> Please post your ceph.conf. >>>>>> >>>>>> - mike >>>>>> >>>>>> >>>>>> On 5/1/2013 12:06 PM, Wyatt Gorman wrote: >>>>>> >>>>>> Hi everyone, >>>>>> >>>>>> I'm setting up a test ceph cluster and am having trouble >>>>>> getting it >>>>>> running (great for testing, huh?). I went through the >>>>>> installation on >>>>>> Debian squeeze, had to modify the mkcephfs script a bit >>>>>> because >>>>>> it calls >>>>>> monmaptool with too many paramaters in the $args variable >>>>>> (mine had >>>>>> "--add a [ip address]:[port] [osd1]" and I had to get rid of >>>>>> the >>>>>> [osd1] >>>>>> part for the monmaptool command to take it). Anyway, so I got >>>>>> it >>>>>> installed, started the service, waiting a little while for it >>>>>> to >>>>>> build >>>>>> the fs, and ran "ceph health" and got (and am still getting >>>>>> after a day >>>>>> and a reboot) the following error: (note: I have also been >>>>>> getting the >>>>>> first line in various calls, unsure why it is complaining, I >>>>>> followed >>>>>> the instructions...) >>>>>> >>>>>> warning: line 34: 'host' in section 'mon.a' redefined >>>>>> 2013-05-01 12:04:39.801102 b733b710 -1 WARNING: unknown auth >>>>>> protocol >>>>>> defined: [osd] >>>>>> HEALTH_WARN 384 pgs degraded; 384 pgs stuck unclean; recovery >>>>>> 21/42 >>>>>> degraded (50.000%) >>>>>> >>>>>> Can anybody tell me the root of this issue, and how I can fix >>>>>> it? Thank you! >>>>>> >>>>>> - Wyatt Gorman >>>>>> >>>>>> >>>>>> _________________________________________________ >>>>>> ceph-users mailing list >>>>>> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> >>>>>> http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com >>>>>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >>>>>> >>>>>> >>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>> >>> >>> >>> -- >>> ______________________________ >>> Dino Yancey >>> 2GNT.com Admin >> >> > > > > -- > ______________________________ > Dino Yancey > 2GNT.com Admin > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com