Re: Gentoo & ceph 0.67 & pg stuck After fresh Installation

Philipp von Strobl-Albeg <philipp@xxxxxxxxxxxx> · Tue, 28 Jan 2014 23:32:08 +0100

Hi all,

thank you very much for your input.

I sync the clock on all hosts per ntpdate pool.ntp.org and sync this 
with the hwclock on every host.
For strange reason, on is after some minutes out of sync. I can't say 
where this comes from...
Perhaps this is a special gentoo-thing or a "cheap-pc"-problem.

What is the worsest thing i have to expect, if i won't fix this ?

Anyway i get manage to fix the pgs stuck-thing.
I redesign the crush map (mainly set the host to the a rack and this to 
default) and now the health is OK !

Thank you again for you kindly help and great job - inktank ;-)

PS: Aaron - your Howto was really helpful

Best
Philipp

Am 20.01.2014 05:59, schrieb Sage Weil:
On Sun, 19 Jan 2014, Sherry Shahbazi wrote:
Hi Philipp,

Installing "ntp" on each server might solve the clock skew problem.
At the very least a onetime 'ntpdate time.apple.com' should make that
issue go away for the time being.

s

Best Regards
Sherry

On Sunday, January 19, 2014 6:34 AM, Philipp Strobl <philipp@xxxxxxxxxxxx>
wrote:
HI Aaron,

sorry for taking so long...

After i add the osd and buckets to the crushmap i get

ceph osd tree
# id    weight    type name    up/down    reweight
-3    1    host dp2
1    1        osd.1    up    1
-2    1    host dp1
0    1        osd.0    up    1
-1    0    root default

Both osds are up and in

ceph osd stat
e25: 2 osds: 2 up, 2 in

ceph health detail says:

HEALTH_WARN 292 pgs stuck inactive; 292 pgs stuck unclean; clock skew
detected on mon.vmsys-dp2
pg 3.f is stuck inactive since forever, current state creating, last acting
[]
pg 0.c is stuck inactive since forever, current state creating, last acting
[]
pg 1.d is stuck inactive since forever, current state creating, last acting
[]
pg 2.e is stuck inactive since forever, current state creating, last acting
[]
pg 3.8 is stuck inactive since forever, current state creating, last acting
[]
pg 0.b is stuck inactive since forever, current state creating, last acting
[]
pg 1.a is stuck inactive since forever, current state creating, last acting
[]
...
pg 2.c is stuck unclean since forever, current state creating, last acting
[]
pg 1.f is stuck unclean since forever, current state creating, last acting
[]
pg 0.e is stuck unclean since forever, current state creating, last acting
[]
pg 3.d is stuck unclean since forever, current state creating, last acting
[]
pg 2.f is stuck unclean since forever, current state creating, last acting
[]
pg 1.c is stuck unclean since forever, current state creating, last acting
[]
pg 0.d is stuck unclean since forever, current state creating, last acting
[]
pg 3.e is stuck unclean since forever, current state creating, last acting
[]
mon.vmsys-dp2 addr 10.0.0.22:6789/0 clock skew 16.4914s > max 0.05s (latency
0.00666228s)

All pgs have the same status.

Is the clock skew an important fact ?

I compiled ceph like this - eix ceph:
...
Installed versions:  0.67{tbz2}(00:54:50 01/08/14)(fuse -debug -gtk
-libatomic -radosgw -static-libs -tcmalloc)

cluster name is vmsys, servers are dp1 and dp2
config:

[global]
     auth cluster required = none
     auth service required = none
     auth client required = none
     auth supported = none
     fsid = 265d12ac-e99d-47b9-9651-05cb2b4387a6

[mon.vmsys-dp1]
     host = dp1
     mon addr = INTERNAL-IP1:6789
     mon data = /var/lib/ceph/mon/ceph-vmsys-dp1

[mon.vmsys-dp2]
     host = dp2
     mon addr = INTERNAL-IP2:6789
     mon data = /var/lib/ceph/mon/ceph-vmsys-dp2

[osd]
[osd.0]
     host = dp1
     devs = /dev/sdb1
     osd_mkfs_type = xfs
     osd data = /var/lib/ceph/osd/ceph-0

[osd.1]
     host = dp2
     devs = /dev/sdb1
     osd_mkfs_type = xfs
     osd data = /var/lib/ceph/osd/ceph-1

[mds.vmsys-dp1]
         host = dp1

[mds.vmsys-dp2]
         host = dp2

Hope this is helpful - i really don't know at the moment what is wrong.

Perhaps i try the manual-deploy howto from inktank or do you have an idea ?

Best Philipp

http://www.pilarkto.net
Am 10.01.2014 20:50, schrieb Aaron Ten Clay:
       Hi Philipp,

It sounds like perhaps you don't have any OSDs that are both "up" and
"in" the cluster. Can you provide the output of "ceph health detail"
and "ceph osd tree" for us?

As for the "howto" you mentioned, I added some notes to the top but
never really updated the body of the document... I'm not entirely sure
it's straightforward or up to date any longer :) I'd be happy to make
changes as needed but I haven't manually deployed a cluster in several
months, and Inktank now has a manual deployment guide for Ceph at
http://ceph.com/docs/master/install/manual-deployment/

-Aaron

On Fri, Jan 10, 2014 at 6:57 AM, Philipp Strobl <philipp@xxxxxxxxxxxx>
wrote:
       Hi,

After managed to deploy ceph manual in gentoo (ceph-disk tools
are under /usr/usr/sbin...), the daemons are coming properly up,
but "ceph health" shows warn for all pgs stuck unclean.
This is a strange behavior for a clean new installtion i guess.

So the question is, do i'm something wrong Or can i reset the
PGs for getting the Cluster Running ?

Also the rbd-Client Or Mount.ceph Hangs with no answer.

I used thishowto: https://github.com/aarontc/ansible-playbooks/blob/master/roles/ceph.
notes-on-deployment.rst

Resp. our German translation/expansion
http://wiki.open-laboratory.de/Intern:IT:HowTo:Ceph

With auth Support ... = none

Best regards
And thank you in advance

Philipp Strobl

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Aaron Ten Clay
http://www.aarontc.com/

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com