Philipp,
I have had issues with clock sync on machines before that I could usually alleviate by tweaking the kernel config. Changing CONFIG_HZ to 300 instead of 1000 can help. If you ever reboot the machines, making sure your init system writes the current software clock to the hardware clock on shutdown (if you use OpenRC, /etc/conf.d/hwclock should have 'clock_hctosys="YES"') can help that situation.Some more hardware details might be helpful. On very, very overloaded systems I've seen the software clock drift a lot, you might just be trying to do too much with the number of cores you have. Also, cheap or badly-configured hardware can cause spurious interrupts, so keeping an eye on the context-switches-per-second, and interrupts-per-second values over time might be a clue for clock drift as well.
Glad you found my notes helpful - I didn't write the majority of that howto, though, just the notes at the top :)
-Aaron
On Tue, Jan 28, 2014 at 2:32 PM, Philipp von Strobl-Albeg <philipp@xxxxxxxxxxxx> wrote:
Hi all,
thank you very much for your input.
I sync the clock on all hosts per ntpdate pool.ntp.org and sync this with the hwclock on every host.
For strange reason, on is after some minutes out of sync. I can't say where this comes from...
Perhaps this is a special gentoo-thing or a "cheap-pc"-problem.
What is the worsest thing i have to expect, if i won't fix this ?
Anyway i get manage to fix the pgs stuck-thing.
I redesign the crush map (mainly set the host to the a rack and this to default) and now the health is OK !
Thank you again for you kindly help and great job - inktank ;-)
PS: Aaron - your Howto was really helpful
Best
Philipp
Am 20.01.2014 05:59, schrieb Sage Weil:
On Sun, 19 Jan 2014, Sherry Shahbazi wrote:
Hi Philipp,At the very least a onetime 'ntpdate time.apple.com' should make that
Installing "ntp" on each server might solve the clock skew problem.
issue go away for the time being.
s
Best Regards
Sherry
On Sunday, January 19, 2014 6:34 AM, Philipp Strobl <philipp@xxxxxxxxxxxx>
wrote:
HI Aaron,
sorry for taking so long...
After i add the osd and buckets to the crushmap i get
ceph osd tree
# id weight type name up/down reweight
-3 1 host dp2
1 1 osd.1 up 1
-2 1 host dp1
0 1 osd.0 up 1
-1 0 root default
Both osds are up and in
ceph osd stat
e25: 2 osds: 2 up, 2 in
ceph health detail says:
HEALTH_WARN 292 pgs stuck inactive; 292 pgs stuck unclean; clock skew
detected on mon.vmsys-dp2
pg 3.f is stuck inactive since forever, current state creating, last acting
[]
pg 0.c is stuck inactive since forever, current state creating, last acting
[]
pg 1.d is stuck inactive since forever, current state creating, last acting
[]
pg 2.e is stuck inactive since forever, current state creating, last acting
[]
pg 3.8 is stuck inactive since forever, current state creating, last acting
[]
pg 0.b is stuck inactive since forever, current state creating, last acting
[]
pg 1.a is stuck inactive since forever, current state creating, last acting
[]
...
pg 2.c is stuck unclean since forever, current state creating, last acting
[]
pg 1.f is stuck unclean since forever, current state creating, last acting
[]
pg 0.e is stuck unclean since forever, current state creating, last acting
[]
pg 3.d is stuck unclean since forever, current state creating, last acting
[]
pg 2.f is stuck unclean since forever, current state creating, last acting
[]
pg 1.c is stuck unclean since forever, current state creating, last acting
[]
pg 0.d is stuck unclean since forever, current state creating, last acting
[]
pg 3.e is stuck unclean since forever, current state creating, last acting
[]
mon.vmsys-dp2 addr 10.0.0.22:6789/0 clock skew 16.4914s > max 0.05s (latency
0.00666228s)
All pgs have the same status.
Is the clock skew an important fact ?
I compiled ceph like this - eix ceph:
...
Installed versions: 0.67{tbz2}(00:54:50 01/08/14)(fuse -debug -gtk
-libatomic -radosgw -static-libs -tcmalloc)
cluster name is vmsys, servers are dp1 and dp2
config:
[global]
auth cluster required = none
auth service required = none
auth client required = none
auth supported = none
fsid = 265d12ac-e99d-47b9-9651-05cb2b4387a6
[mon.vmsys-dp1]
host = dp1
mon addr = INTERNAL-IP1:6789
mon data = "">dp1
[mon.vmsys-dp2]
host = dp2
mon addr = INTERNAL-IP2:6789
mon data = "">dp2
[osd]
[osd.0]
host = dp1
devs = /dev/sdb1
osd_mkfs_type = xfs
osd data = "">
[osd.1]
host = dp2
devs = /dev/sdb1
osd_mkfs_type = xfs
osd data = "">
[mds.vmsys-dp1]
host = dp1
[mds.vmsys-dp2]
host = dp2
Hope this is helpful - i really don't know at the moment what is wrong.
Perhaps i try the manual-deploy howto from inktank or do you have an idea ?
Best Philipp
http://www.pilarkto.net
Am 10.01.2014 20:50, schrieb Aaron Ten Clay:
Hi Philipp,
It sounds like perhaps you don't have any OSDs that are both "up" and
"in" the cluster. Can you provide the output of "ceph health detail"
and "ceph osd tree" for us?
As for the "howto" you mentioned, I added some notes to the top but
never really updated the body of the document... I'm not entirely sure
it's straightforward or up to date any longer :) I'd be happy to make
changes as needed but I haven't manually deployed a cluster in several
months, and Inktank now has a manual deployment guide for Ceph at
http://ceph.com/docs/master/install/manual-deployment/
-Aaron
On Fri, Jan 10, 2014 at 6:57 AM, Philipp Strobl <philipp@xxxxxxxxxxxx>
wrote:
Hi,
After managed to deploy ceph manual in gentoo (ceph-disk tools
are under /usr/usr/sbin...), the daemons are coming properly up,
but "ceph health" shows warn for all pgs stuck unclean.
This is a strange behavior for a clean new installtion i guess.
So the question is, do i'm something wrong Or can i reset the
PGs for getting the Cluster Running ?
Also the rbd-Client Or Mount.ceph Hangs with no answer.
I used thishowto: https://github.com/aarontc/ansible-playbooks/blob/master/roles/ceph.
notes-on-deployment.rst
Resp. our German translation/expansion
http://wiki.open-laboratory.de/Intern:IT:HowTo:Ceph
With auth Support ... = none
Best regards
And thank you in advance
Philipp Strobl
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Aaron Ten Clay
http://www.aarontc.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Aaron Ten Clay
http://www.aarontc.com/
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com