You need to fix your clocks (usually with ntp). According to the log message they can be off by 50ms and yours seems to be about 85ms off. On 6/6/2013 8:40 PM, Joshua Mesilane wrote: > Hi, > > I'm currently evaulating ceph as a solution to some HA storage that > we're looking at. To test I have 3 servers, with two disks to be used > for OSDs on them (journals on the same disk as the OSD). I've deployed > the cluster with 3 mons (one on each server) 6 OSDs (2 on each server) > and 3 MDS (1 on each server) > > I've built the cluster using ceph-deploy checked out from git on my > local workstation (Fedora 15) and the Severs themselves are running > CentOS 6.4 > > First note: It looks like the ceph-deploy tool, when you run > "ceph-deploy osd perpare host:device" is actually also activating the > OSD when it's done instead of waiting for you to run the ceph-deploy > osd activate command. > > Question: Is ceph-deploy supposed to be writing out the [mon] and the > [osd] sections to the ceph.conf configuration file? I can't find any > reference to anything in the config file except for the [global] > section, and there are no other sections. > > Question: Once I got all 6 of my OSDs online I'm getting the following > health error: > > "health HEALTH_WARN 91 pgs degraded; 192 pgs stuck unclean; clock skew > detected on mon.sv-dev-ha02, mon.sv-dev-ha03" > > ceph health details gives me (Truncated for readability): > > [root@sv-dev-ha02 ~]# ceph health detail > HEALTH_WARN 91 pgs degraded; 192 pgs stale; 192 pgs stuck unclean; 2/6 > in osds are down; clock skew detected on mon.sv-dev-ha02, mon.sv-dev-ha03 > pg 2.3d is stuck unclean since forever, current state > stale+active+remapped, last acting [1,0] > pg 1.3e is stuck unclean since forever, current state > stale+active+remapped, last acting [1,0] > .... (Lots more lines like this) ... > pg 1.1 is stuck unclean since forever, current state > stale+active+remapped, last acting [1,0] > pg 0.0 is stuck unclean since forever, current state > stale+active+degraded, last acting [0] > pg 0.3f is stale+active+remapped, acting [1,0] > pg 1.3e is stale+active+remapped, acting [1,0] > ... (Lots more lines like this) ... > pg 1.1 is stale+active+remapped, acting [1,0] > pg 2.2 is stale+active+remapped, acting [1,0] > osd.0 is down since epoch 25, last address 10.20.100.90:6800/3994 > osd.1 is down since epoch 25, last address 10.20.100.90:6803/4758 > mon.sv-dev-ha02 addr 10.20.100.91:6789/0 clock skew 0.0858782s > max > 0.05s (latency 0.00546217s) > mon.sv-dev-ha03 addr 10.20.100.92:6789/0 clock skew 0.0852838s > max > 0.05s (latency 0.00533693s) > > Any help on how to start troubleshooting this issue would be appreciated. > > Cheers, > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com