[ Please keep all discussions on the list. :) ] Okay, so you've now got just 128 that are sad. Those are all in pool 2, which I believe is "rbd" — you'll need to set your replication level to 1 on all pools and that should fix it. :) Keep in mind that with 1x replication you've only got 1 copy of everything though, so if you lose one disk you're going to lose data. You really want to get enough disks to set 2x replication. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Wed, May 1, 2013 at 2:34 PM, Wyatt Gorman <wyattgorman@xxxxxxxxxxxxxxx> wrote: > ceph -s > health HEALTH_WARN 128 pgs degraded; 128 pgs stuck unclean > monmap e1: 1 mons at {a=10.81.2.100:6789/0}, election epoch 1, quorum 0 a > osdmap e40: 1 osds: 1 up, 1 in > pgmap v759: 384 pgs: 256 active+clean, 128 active+degraded; 8699 bytes > data, 3430 MB used, 47828 MB / 54002 MB avail > mdsmap e41: 1/1/1 up {0=a=up:active} > > > http://pastebin.com/0d7UM5s4 > > Thanks for your help, Greg. > > > On Wed, May 1, 2013 at 4:41 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >> >> On Wed, May 1, 2013 at 1:32 PM, Dino Yancey <dino2gnt@xxxxxxxxx> wrote: >> > Hi Wyatt, >> > >> > This is almost certainly a configuration issue. If i recall, there is a >> > min_size setting in the CRUSH rules for each pool that defaults to two >> > which >> > you may also need to reduce to one. I don't have the documentation in >> > front >> > of me, so that's just off the top of my head... >> >> Hmm, no. The min_size should be set automatically to 1/2 of the >> specified size (rounded up), which would be 1 in this case. >> What's the full output of ceph -s? Can you pastebin the output of >> "ceph pg dump" please? >> -Greg >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> >> > >> > Dino >> > >> > >> > On Wed, May 1, 2013 at 3:19 PM, Wyatt Gorman >> > <wyattgorman@xxxxxxxxxxxxxxx> >> > wrote: >> >> >> >> Okay! Dino, thanks for your response. I reduced my metadata pool size >> >> and >> >> data pool size to 1, which eliminated the "recovery 21/42 degraded >> >> (50.000%)" at the end of my HEALTH_WARN error. So now, when I run "ceph >> >> health" I get the following: >> >> >> >> HEALTH_WARN 384 pgs degraded; 384 pgs stale; 384 pgs stuck unclean >> >> >> >> So this seems to be from one single root cause. Any ideas? Again, is >> >> this >> >> a corrupted drive issue that I can clean up, or is this still a ceph >> >> configuration error? >> >> >> >> >> >> On Wed, May 1, 2013 at 12:52 PM, Dino Yancey <dino2gnt@xxxxxxxxx> >> >> wrote: >> >>> >> >>> Hi Wyatt, >> >>> >> >>> You need to reduce the replication level on your existing pools to 1, >> >>> or >> >>> bring up another OSD. The default configuration specifies a >> >>> replication >> >>> level of 2, and the default crush rules want to place a replica on two >> >>> distinct OSDs. With one OSD, CRUSH can't determine placement for the >> >>> replica and so Ceph is reporting a degraded state. >> >>> >> >>> Dino >> >>> >> >>> >> >>> On Wed, May 1, 2013 at 11:45 AM, Wyatt Gorman >> >>> <wyattgorman@xxxxxxxxxxxxxxx> wrote: >> >>>> >> >>>> Well, those points solved the issue of the redefined host and the >> >>>> unidentified protocol. The >> >>>> >> >>>> >> >>>> "HEALTH_WARN 384 pgs degraded; 384 pgs stuck unclean; recovery 21/42 >> >>>> degraded (50.000%)" >> >>>> >> >>>> error is still an issue, though. Is this something simple like some >> >>>> hard >> >>>> drive corruption that I can clean up with a fsck, or is this a ceph >> >>>> issue? >> >>>> >> >>>> >> >>>> >> >>>> On Wed, May 1, 2013 at 12:31 PM, Mike Dawson >> >>>> <mike.dawson@xxxxxxxxxxxxxxxx> wrote: >> >>>>> >> >>>>> Wyatt, >> >>>>> >> >>>>> A few notes: >> >>>>> >> >>>>> - Yes, the second "host = ceph" under mon.a is redundant and should >> >>>>> be >> >>>>> deleted. >> >>>>> >> >>>>> - "auth client required = cephx [osd]" should be simply >> >>>>> auth client required = cephx". >> >>>>> >> >>>>> - Looks like you only have one OSD. You need at least as many (and >> >>>>> hopefully more) OSDs than highest replication level out of your >> >>>>> pools. >> >>>>> >> >>>>> Mike >> >>>>> >> >>>>> >> >>>>> On 5/1/2013 12:23 PM, Wyatt Gorman wrote: >> >>>>>> >> >>>>>> Here is my ceph.conf. I just figured out that the second host = >> >>>>>> isn't >> >>>>>> necessary, though it is like that on the 5-minute quick start >> >>>>>> guide... >> >>>>>> (Perhaps I'll submit my couple of fixes that I've had to implement >> >>>>>> so >> >>>>>> far). That fixes the "redefined host" issue, but none of the >> >>>>>> others. >> >>>>>> >> >>>>>> [global] >> >>>>>> # For version 0.55 and beyond, you must explicitly enable or >> >>>>>> # disable authentication with "auth" entries in [global]. >> >>>>>> >> >>>>>> auth cluster required = cephx >> >>>>>> auth service required = cephx >> >>>>>> auth client required = cephx [osd] >> >>>>>> osd journal size = 1000 >> >>>>>> >> >>>>>> #The following assumes ext4 filesystem. >> >>>>>> filestore xattr use omap = true >> >>>>>> # For Bobtail (v 0.56) and subsequent versions, you may add >> >>>>>> #settings for mkcephfs so that it will create and mount the >> >>>>>> file >> >>>>>> #system on a particular OSD for you. Remove the comment `#` >> >>>>>> #character for the following settings and replace the values >> >>>>>> in >> >>>>>> #braces with appropriate values, or leave the following >> >>>>>> settings >> >>>>>> #commented out to accept the default values. You must specify >> >>>>>> #the --mkfs option with mkcephfs in order for the deployment >> >>>>>> #script to utilize the following settings, and you must define >> >>>>>> #the 'devs' option for each osd instance; see below. osd mkfs >> >>>>>> #type = {fs-type} osd mkfs options {fs-type} = {mkfs options} >> >>>>>> # >> >>>>>> #default for xfs is "-f" osd mount options {fs-type} = {mount >> >>>>>> #options} # default mount option is "rw,noatime" >> >>>>>> # For example, for ext4, the mount option might look like >> >>>>>> this: >> >>>>>> >> >>>>>> #osd mkfs options ext4 = user_xattr,rw,noatime >> >>>>>> # Execute $ hostname to retrieve the name of your host, and >> >>>>>> # replace {hostname} with the name of your host. For the >> >>>>>> # monitor, replace {ip-address} with the IP address of your >> >>>>>> # host. >> >>>>>> [mon.a] >> >>>>>> host = ceph >> >>>>>> mon addr = 10.81.2.100:6789 <http://10.81.2.100:6789> [osd.0] >> >>>>>> >> >>>>>> host = ceph >> >>>>>> >> >>>>>> # For Bobtail (v 0.56) and subsequent versions, you may add >> >>>>>> # settings for mkcephfs so that it will create and mount the >> >>>>>> # file system on a particular OSD for you. Remove the comment >> >>>>>> # `#` character for the following setting for each OSD and >> >>>>>> # specify a path to the device if you use mkcephfs with the >> >>>>>> # --mkfs option. >> >>>>>> >> >>>>>> #devs = {path-to-device} >> >>>>>> [osd.1] >> >>>>>> host = ceph >> >>>>>> #devs = {path-to-device} >> >>>>>> [mds.a] >> >>>>>> host = ceph >> >>>>>> >> >>>>>> >> >>>>>> On Wed, May 1, 2013 at 12:14 PM, Mike Dawson >> >>>>>> <mike.dawson@xxxxxxxxxxxxxxxx >> >>>>>> <mailto:mike.dawson@xxxxxxxxxxxxxxxx>> >> >>>>>> wrote: >> >>>>>> >> >>>>>> Wyatt, >> >>>>>> >> >>>>>> Please post your ceph.conf. >> >>>>>> >> >>>>>> - mike >> >>>>>> >> >>>>>> >> >>>>>> On 5/1/2013 12:06 PM, Wyatt Gorman wrote: >> >>>>>> >> >>>>>> Hi everyone, >> >>>>>> >> >>>>>> I'm setting up a test ceph cluster and am having trouble >> >>>>>> getting it >> >>>>>> running (great for testing, huh?). I went through the >> >>>>>> installation on >> >>>>>> Debian squeeze, had to modify the mkcephfs script a bit >> >>>>>> because >> >>>>>> it calls >> >>>>>> monmaptool with too many paramaters in the $args variable >> >>>>>> (mine had >> >>>>>> "--add a [ip address]:[port] [osd1]" and I had to get rid >> >>>>>> of >> >>>>>> the >> >>>>>> [osd1] >> >>>>>> part for the monmaptool command to take it). Anyway, so I >> >>>>>> got >> >>>>>> it >> >>>>>> installed, started the service, waiting a little while for >> >>>>>> it >> >>>>>> to >> >>>>>> build >> >>>>>> the fs, and ran "ceph health" and got (and am still getting >> >>>>>> after a day >> >>>>>> and a reboot) the following error: (note: I have also been >> >>>>>> getting the >> >>>>>> first line in various calls, unsure why it is complaining, >> >>>>>> I >> >>>>>> followed >> >>>>>> the instructions...) >> >>>>>> >> >>>>>> warning: line 34: 'host' in section 'mon.a' redefined >> >>>>>> 2013-05-01 12:04:39.801102 b733b710 -1 WARNING: unknown >> >>>>>> auth >> >>>>>> protocol >> >>>>>> defined: [osd] >> >>>>>> HEALTH_WARN 384 pgs degraded; 384 pgs stuck unclean; >> >>>>>> recovery >> >>>>>> 21/42 >> >>>>>> degraded (50.000%) >> >>>>>> >> >>>>>> Can anybody tell me the root of this issue, and how I can >> >>>>>> fix >> >>>>>> it? Thank you! >> >>>>>> >> >>>>>> - Wyatt Gorman >> >>>>>> >> >>>>>> >> >>>>>> _________________________________________________ >> >>>>>> ceph-users mailing list >> >>>>>> ceph-users@xxxxxxxxxxxxxx >> >>>>>> <mailto:ceph-users@xxxxxxxxxxxxxx> >> >>>>>> http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com >> >>>>>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >> >>>>>> >> >>>>>> >> >>>> >> >>>> >> >>>> _______________________________________________ >> >>>> ceph-users mailing list >> >>>> ceph-users@xxxxxxxxxxxxxx >> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >>>> >> >>> >> >>> >> >>> >> >>> -- >> >>> ______________________________ >> >>> Dino Yancey >> >>> 2GNT.com Admin >> >> >> >> >> > >> > >> > >> > -- >> > ______________________________ >> > Dino Yancey >> > 2GNT.com Admin >> > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@xxxxxxxxxxxxxx >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com