Re: giant release osd down

Christian Balzer <chibi@xxxxxxx> · Mon, 3 Nov 2014 18:12:07 +0900

Hello,

On Mon, 3 Nov 2014 01:01:32 -0500 (EST) Ian Colle wrote:

> Christian,
> 
> Why are you not fond of ceph-deploy?
> 
In short, this very thread.

Ceph-deploy hides a number of things from the users that are pretty vital
for a working ceph cluster and insufficiently or not at all documented in
the manual-deploy documentation.
Specifically the GPT magic, which isn't documented at all (and no,
dissecting python code or some blurb on GIT is not the same as
documentation on the Ceph homepage) and flag files like sysvinit.
There a numerous cases in this ML where people wound up with OSDs that
didn't start (at least at boot time) due to this omission, dependence on
ceph-deploy.

That GPT magic also makes things a lot less flexible (can't use a full
device, have to partition it first) and leads to hilarious things like
ceph-deploy "preparing" an OSD and udev happily starting it up even though
that wasn't requested.

So when people fail to do a manual deploy the answer tends to be "use
ceph-deploy" (and go from there in my particular reply) instead of "Did
you follow the docs in section blah?".

Then there are problems with ceph-deploy itself, like correctly picking up
formatting parameters from the config, but NOT defaulting to the
filesystem type specified there.
And since it's role is supposed to be helping people with quick deployment
(and teardown) of test clusters, the lack of the remove functionality for
OSDs isn't particular helpful either.

Christian

> Ian R. Colle
> Global Director
> of Software Engineering
> Red Hat (Inktank is now part of Red Hat!)
> http://www.linkedin.com/in/ircolle
> http://www.twitter.com/ircolle
> Cell: +1.303.601.7713
> Email: icolle@xxxxxxxxxx
> 
> ----- Original Message -----
> From: "Christian Balzer" <chibi@xxxxxxx>
> To: ceph-users@xxxxxxxx
> Cc: "Shiv Raj Singh" <virk.shiv@xxxxxxxxx>
> Sent: Sunday, November 2, 2014 8:37:18 AM
> Subject: Re:  giant release osd down
> 
> 
> Hello,
> 
> On Mon, 3 Nov 2014 00:48:20 +1300 Shiv Raj Singh wrote:
> 
> > Hi All
> > 
> > I am new to ceph and I have been trying to configure 3 node ceph
> > cluster with 1 monitor and 2 osd nodes. I have reinstall and recreated
> > the cluster three teams and I ma stuck against the wall . My monitor is
> > working as desired (I guess) but the status of the ods is down. I am
> > following this link
> > http://docs.ceph.com/docs/v0.80.5/install/manual-deployment/ for
> > configuring the osd. The reason why I am not using ceph-deply is
> > because I want to understand the technology.
> > 
> > can someone please help e udnerstand what im doing wrong !! :-) !!
> > 
> a) You're using OSS. Caveat emperor and so forth.
> In particular you seem to be following documentation for Firefly while
> the 64 PGs below indicate that you're actually installing Giant.
> 
> b) Since Firefly Ceph defaults to a replication size of 3, so 2 OSD won't
> do.
> 
> c) But wait, you specified a pool size of 2 in your OSD section! Tough
> luck, because since Firefly there is a bug that at the very least
> prevents OSD and RGW parameters from being parsed outside the global
> section (which incidentally is what the documentation you cited
> suggests...)
> 
> d) Your OSDs are down, so all of the above is (kinda) pointless.
> 
> So without further info (log files, etc) we won't be able to help you
> much.
> 
> My suggestion would be to take the above to heart, try with ceph-deploy
> (which I'm not fond of) and if that works try again manually and see
> where it fails.
> 
> Regards,
> 
> Christian
> 
> > *Some useful diagnostic information *
> > ceph2:~$ ceph osd tree
> > # id    weight  type name       up/down reweight
> > -1      2       root default
> > -3      1               host ceph2
> > 0       1                       osd.0   down    0
> > -2      1               host ceph3
> > 1       1                       osd.1   down    0
> > 
> > ceph health detail
> > HEALTH_WARN 64 pgs stuck inactive; 64 pgs stuck unclean
> > pg 0.22 is stuck inactive since forever, current state creating, last
> > acting []
> > pg 0.21 is stuck inactive since forever, current state creating, last
> > acting []
> > pg 0.20 is stuck inactive since forever, current state creating, last
> > acting []
> > 
> > 
> > ceph -s
> >     cluster a04ee359-82f8-44c4-89b5-60811bef3f19
> >      health HEALTH_WARN 64 pgs stuck inactive; 64 pgs stuck unclean
> >      monmap e1: 1 mons at {ceph1=192.168.101.41:6789/0}, election epoch
> > 1, quorum 0 ceph1
> >      osdmap e9: 2 osds: 0 up, 0 in
> >       pgmap v10: 64 pgs, 1 pools, 0 bytes data, 0 objects
> >             0 kB used, 0 kB / 0 kB avail
> >                   64 creating
> > 
> > 
> > My configurations are as below:
> > 
> > sudo nano /etc/ceph/ceph.conf
> > 
> > [global]
> > 
> >         fsid = a04ee359-82f8-44c4-89b5-60811bef3f19
> >         mon initial members = ceph1
> >         mon host = 192.168.101.41
> >         public network = 192.168.101.0/24
> > 
> >         auth cluster required = cephx
> >         auth service required = cephx
> >         auth client required = cephx
> > 
> > 
> > 
> > [osd]
> >         osd journal size = 1024
> >         filestore xattr use omap = true
> > 
> >         osd pool default size = 2
> >         osd pool default min size = 1
> >         osd pool default pg num = 333
> >         osd pool default pgp num = 333
> >         osd crush chooseleaf type = 1
> > 
> > [mon.ceph1]
> >         host = ceph1
> >         mon addr = 192.168.101.41:6789
> > 
> > 
> > [osd.0]
> >         host = ceph2
> >         #devs = {path-to-device}
> > 
> > [osd.1]
> >         host = ceph3
> >         #devs = {path-to-device}
> > 
> > 
> > ..........
> > 
> > OSD mount location
> > 
> > On ceph2
> > /dev/sdb1                              5.0G  1.1G  4.0G  21%
> > /var/lib/ceph/osd/ceph-0
> > 
> > on Ceph3
> > /dev/sdb1                              5.0G  1.1G  4.0G  21%
> > /var/lib/ceph/osd/ceph-1
> > 
> > My Linux OS
> > 
> > lsb_release -a
> > No LSB modules are available.
> > Distributor ID: Ubuntu
> > Description:    Ubuntu 14.04 LTS
> > Release:        14.04
> > Codename:       trusty
> > 
> > Regards
> > 
> > Shiv
> 
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com