Re: Gentoo & ceph 0.67 & pg stuck After fresh Installation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Philipp,

I have had issues with clock sync on machines before that I could usually alleviate by tweaking the kernel config. Changing CONFIG_HZ to 300 instead of 1000 can help. If you ever reboot the machines, making sure your init system writes the current software clock to the hardware clock on shutdown (if you use OpenRC, /etc/conf.d/hwclock should have 'clock_hctosys="YES"') can help that situation.

Some more hardware details might be helpful. On very, very overloaded systems I've seen the software clock drift a lot, you might just be trying to do too much with the number of cores you have. Also, cheap or badly-configured hardware can cause spurious interrupts, so keeping an eye on the context-switches-per-second, and interrupts-per-second values over time might be a clue for clock drift as well.

Glad you found my notes helpful - I didn't write the majority of that howto, though, just the notes at the top :)

-Aaron


On Tue, Jan 28, 2014 at 2:32 PM, Philipp von Strobl-Albeg <philipp@xxxxxxxxxxxx> wrote:
Hi all,

thank you very much for your input.

I sync the clock on all hosts per ntpdate pool.ntp.org and sync this with the hwclock on every host.
For strange reason, on is after some minutes out of sync. I can't say where this comes from...
Perhaps this is a special gentoo-thing or a "cheap-pc"-problem.

What is the worsest thing i have to expect, if i won't fix this ?


Anyway i get manage to fix the pgs stuck-thing.
I redesign the crush map (mainly set the host to the a rack and this to default) and now the health is OK !


Thank you again for you kindly help and great job - inktank ;-)

PS: Aaron - your Howto was really helpful


Best
Philipp


Am 20.01.2014 05:59, schrieb Sage Weil:

On Sun, 19 Jan 2014, Sherry Shahbazi wrote:
Hi Philipp,

Installing "ntp" on each server might solve the clock skew problem.
At the very least a onetime 'ntpdate time.apple.com' should make that
issue go away for the time being.

s

  Best Regards
Sherry


On Sunday, January 19, 2014 6:34 AM, Philipp Strobl <philipp@xxxxxxxxxxxx>
wrote:
HI Aaron,

sorry for taking so long...

After i add the osd and buckets to the crushmap i get

ceph osd tree
# id    weight    type name    up/down    reweight
-3    1    host dp2
1    1        osd.1    up    1
-2    1    host dp1
0    1        osd.0    up    1
-1    0    root default


Both osds are up and in

ceph osd stat
e25: 2 osds: 2 up, 2 in

ceph health detail says:

HEALTH_WARN 292 pgs stuck inactive; 292 pgs stuck unclean; clock skew
detected on mon.vmsys-dp2
pg 3.f is stuck inactive since forever, current state creating, last acting
[]
pg 0.c is stuck inactive since forever, current state creating, last acting
[]
pg 1.d is stuck inactive since forever, current state creating, last acting
[]
pg 2.e is stuck inactive since forever, current state creating, last acting
[]
pg 3.8 is stuck inactive since forever, current state creating, last acting
[]
pg 0.b is stuck inactive since forever, current state creating, last acting
[]
pg 1.a is stuck inactive since forever, current state creating, last acting
[]
...
pg 2.c is stuck unclean since forever, current state creating, last acting
[]
pg 1.f is stuck unclean since forever, current state creating, last acting
[]
pg 0.e is stuck unclean since forever, current state creating, last acting
[]
pg 3.d is stuck unclean since forever, current state creating, last acting
[]
pg 2.f is stuck unclean since forever, current state creating, last acting
[]
pg 1.c is stuck unclean since forever, current state creating, last acting
[]
pg 0.d is stuck unclean since forever, current state creating, last acting
[]
pg 3.e is stuck unclean since forever, current state creating, last acting
[]
mon.vmsys-dp2 addr 10.0.0.22:6789/0 clock skew 16.4914s > max 0.05s (latency
0.00666228s)

All pgs have the same status.

Is the clock skew an important fact ?

I compiled ceph like this - eix ceph:
...
Installed versions:  0.67{tbz2}(00:54:50 01/08/14)(fuse -debug -gtk
-libatomic -radosgw -static-libs -tcmalloc)
  cluster name is vmsys, servers are dp1 and dp2
config:

[global]
     auth cluster required = none
     auth service required = none
     auth client required = none
     auth supported = none
     fsid = 265d12ac-e99d-47b9-9651-05cb2b4387a6

[mon.vmsys-dp1]
     host = dp1
     mon addr = INTERNAL-IP1:6789
     mon data = "">dp1

[mon.vmsys-dp2]
     host = dp2
     mon addr = INTERNAL-IP2:6789
     mon data = "">dp2

[osd]
[osd.0]
     host = dp1
     devs = /dev/sdb1
     osd_mkfs_type = xfs
     osd data = "">
[osd.1]
     host = dp2
     devs = /dev/sdb1
     osd_mkfs_type = xfs
     osd data = "">
[mds.vmsys-dp1]
         host = dp1

[mds.vmsys-dp2]
         host = dp2



Hope this is helpful - i really don't know at the moment what is wrong.

Perhaps i try the manual-deploy howto from inktank or do you have an idea ?



Best Philipp

http://www.pilarkto.net
Am 10.01.2014 20:50, schrieb Aaron Ten Clay:
       Hi Philipp,

It sounds like perhaps you don't have any OSDs that are both "up" and
"in" the cluster. Can you provide the output of "ceph health detail"
and "ceph osd tree" for us?

As for the "howto" you mentioned, I added some notes to the top but
never really updated the body of the document... I'm not entirely sure
it's straightforward or up to date any longer :) I'd be happy to make
changes as needed but I haven't manually deployed a cluster in several
months, and Inktank now has a manual deployment guide for Ceph at
http://ceph.com/docs/master/install/manual-deployment/

-Aaron



On Fri, Jan 10, 2014 at 6:57 AM, Philipp Strobl <philipp@xxxxxxxxxxxx>
wrote:
       Hi,

After managed to deploy ceph manual in gentoo (ceph-disk tools
are under /usr/usr/sbin...), the daemons are coming properly up,
but "ceph health" shows warn for all pgs stuck unclean.
This is a strange behavior for a clean new installtion i guess.

So the question is, do i'm something wrong Or can i reset the
PGs for getting the Cluster Running ?

Also the rbd-Client Or Mount.ceph Hangs with no answer.

I used thishowto: https://github.com/aarontc/ansible-playbooks/blob/master/roles/ceph.
notes-on-deployment.rst

Resp. our German translation/expansion
http://wiki.open-laboratory.de/Intern:IT:HowTo:Ceph

With auth Support ... = none


Best regards
And thank you in advance

Philipp Strobl



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Aaron Ten Clay
http://www.aarontc.com/



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com







--
Aaron Ten Clay
http://www.aarontc.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux