OSD deployed with ceph directories but not using Cinder volumes

johanni.thunstrom@xxxxxxxxxxx (Johanni Thunstrom) · Tue, 26 May 2015 17:26:40 +0000

Dear Ceph Team,

Our cluster includes three Ceph nodes with 1 MON and 1 OSD in each. All nodes are running on CentOS 6.5 (kernel 2.6.32) VMs in a testing cluster, not production. The script we?re using is a simplified sequence of steps that does more or less what the ceph-cookbook does. Using OpenStack Cinder, we have attached a 10G block volume to each node in order to setup the OSD. After running our ceph cluster initialization script (attached), our cluster has a status of HEALTH_WARN and PG status of incomplete. Additionally all PGs in every Ceph node have the same acting and up set: [0]. Is this an indicator that the PG?s have not even started the creating state, since not every OSD has the id 0 yet they all state 0 as their up and acting OSD? Additionally the weight of all OSD?s is 0. Overall, the OSD?s appear to be up and in. The network appears to be fine; we are able to ping & telnet to each server from one another.

In order to isolate our problem, we tried replacing the attached cinder volume for a  10G xfs formatted file mounted to /ceph-data. We set OSD_PATH=/ceph-data and JOURNAL_PATH=/ceph-data/journal, and kept the rest of our setup_ceph.sh script the same. Our ceph cluster was able to reach a status of HEALTH_OK and all PGs were active+clean.

What seems to be missing is the communication between the OSDs to replicate/create the PGs correctly. Any advice on what?s blocking the PGs from reaching an active+clean state? We are very stumped as to why the cluster using an attached cinder volume fails to reach HEALTH_OK.

If I left out any important information or explanation on how the ceph cluster was created, let me know. Thank you!

Sincerely,
Johanni B. Thunstrom

Health Output:

ceph ?s
    cluster cbbcfd09-9e8e-4cd1-905f-4b8e0fdb48cf
     health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck unclean
     monmap e3: 3 mons at {cephscriptdeplcindervol01=10.98.66.235:6789/0,cephscriptdeplcindervol02=10.98.66.229:6789/0,cephscriptdeplcindervol03=10.98.66.226:6789/0}, election epoch 6, quorum 0,1,2 cephscriptdeplcindervol03,cephscriptdeplcindervol02,cephscriptdeplcindervol01
     osdmap e11: 3 osds: 3 up, 3 in
      pgmap v23: 192 pgs, 3 pools, 0 bytes data, 0 objects
            101608 kB used, 15227 MB / 15326 MB avail
                 192 incomplete

ceph health detail
HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck unclean
pg 1.2c is stuck inactive since forever, current state incomplete, last acting [0]
pg 0.2d is stuck inactive since forever, current state incomplete, last acting [0]
..
?
..
pg 0.2e is stuck unclean since forever, current state incomplete, last acting [0]
pg 1.2f is stuck unclean since forever, current state incomplete, last acting [0]
pg 2.2c is stuck unclean since forever, current state incomplete, last acting [0]
pg 2.2f is incomplete, acting [0] (reducing pool rbd min_size from 2 may help; search ceph.com/docs for 'incomplete')
..
?.
..
pg 1.30 is incomplete, acting [0] (reducing pool metadata min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 0.31 is incomplete, acting [0] (reducing pool data min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 2.32 is incomplete, acting [0] (reducing pool rbd min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 1.31 is incomplete, acting [0] (reducing pool metadata min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 0.30 is incomplete, acting [0] (reducing pool data min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 2.2d is incomplete, acting [0] (reducing pool rbd min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 1.2e is incomplete, acting [0] (reducing pool metadata min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 0.2f is incomplete, acting [0] (reducing pool data min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 2.2c is incomplete, acting [0] (reducing pool rbd min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 1.2f is incomplete, acting [0] (reducing pool metadata min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 0.2e is incomplete, acting [0] (reducing pool data min_size from 2 may help; search ceph.com/docs for 'incomplete')

ceph mon dump
dumped monmap epoch 3
epoch 3
fsid cbbcfd09-9e8e-4cd1-905f-4b8e0fdb48cf
last_changed 2015-05-18 23:10:39.218552
created 0.000000
0: 10.98.66.226:6789/0 mon.cephscriptdeplcindervol03
1: 10.98.66.229:6789/0 mon.cephscriptdeplcindervol02
2: 10.98.66.235:6789/0 mon.cephscriptdeplcindervol01

ceph osd dump
epoch 11
fsid cbbcfd09-9e8e-4cd1-905f-4b8e0fdb48cf
created 2015-05-18 22:35:14.823379
modified 2015-05-18 23:10:59.037467
flags
pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool crash_replay_interval 45 stripe_width 0
pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
max_osd 3
osd.0 up   in  weight 1 up_from 4 up_thru 5 down_at 0 last_clean_interval [0,0) 10.98.66.235:6800/3959 10.98.66.235:6801/3959 10.98.66.235:6802/3959 10.98.66.235:6803/3959 exists,up 71c866d3-2163-4574-a0aa-a4e0fa8c3569
osd.1 up   in  weight 1 up_from 8 up_thru 0 down_at 0 last_clean_interval [0,0) 10.98.66.229:6800/4137 10.98.66.229:6801/4137 10.98.66.229:6802/4137 10.98.66.229:6803/4137 exists,up 1ee644fc-3fc7-4f3b-9e5b-96ba6a8afb99
osd.2 up   in  weight 1 up_from 11 up_thru 0 down_at 0 last_clean_interval [0,0) 10.98.66.226:6800/4139 10.98.66.226:6801/4139 10.98.66.226:6802/4139 10.98.66.226:6803/4139 exists,up 6bee9a39-b909-483f-a5a0-ed4e1b016638

ceph osd tree
# id weight type name up/down reweight
-1 0 root default
-2 0 host cephscriptdeplcindervol01
0 0 osd.0 up 1
-3 0 host cephscriptdeplcindervol02
1 0 osd.1 up 1
-4 0 host cephscriptdeplcindervol03
2 0 osd.2 up 1

*on second ceph node
ceph pg map 0.1f
osdmap e11 pg 0.1f (0.1f) -> up [0] acting [0]

*on first (bootstrap) ceph node
ceph pg map 0.1f
osdmap e11 pg 0.1f (0.1f) -> up [0] acting [0]

ceph pg 0.1f query
{ "state": "incomplete",
  "epoch": 11,
  "up": [
        0],
  "acting": [
        0],
  "info": { "pgid": "0.1f",
      "last_update": "0'0",
      "last_complete": "0'0",
      "log_tail": "0'0",
      "last_user_version": 0,
      "last_backfill": "MAX",
      "purged_snaps": "[]",
      "history": { "epoch_created": 1,
          "last_epoch_started": 0,
          "last_epoch_clean": 1,
          "last_epoch_split": 0,
          "same_up_since": 4,
          "same_interval_since": 4,
          "same_primary_since": 4,
          "last_scrub": "0'0",
          "last_scrub_stamp": "2015-05-18 22:35:38.460878",
          "last_deep_scrub": "0'0",
          "last_deep_scrub_stamp": "2015-05-18 22:35:38.460878",
          "last_clean_scrub_stamp": "0.000000"},
      "stats": { "version": "0'0",
          "reported_seq": "10",
          "reported_epoch": "11",
          "state": "incomplete",
          "last_fresh": "2015-05-18 23:10:59.047056",
          "last_change": "2015-05-18 22:35:38.461314",
          "last_active": "0.000000",
          "last_clean": "0.000000",
          "last_became_active": "0.000000",
          "last_unstale": "2015-05-18 23:10:59.047056",
          "mapping_epoch": 4,
          "log_start": "0'0",
          "ondisk_log_start": "0'0",
          "created": 1,
          "last_epoch_clean": 1,
          "parent": "0.0",
          "parent_split_bits": 0,
          "last_scrub": "0'0",
          "last_scrub_stamp": "2015-05-18 22:35:38.460878",
          "last_deep_scrub": "0'0",
          "last_deep_scrub_stamp": "2015-05-18 22:35:38.460878",
          "last_clean_scrub_stamp": "0.000000",
          "log_size": 0,
          "ondisk_log_size": 0,
          "stats_invalid": "0",
          "stat_sum": { "num_bytes": 0,
              "num_objects": 0,
              "num_object_clones": 0,
              "num_object_copies": 0,
              "num_objects_missing_on_primary": 0,
              "num_objects_degraded": 0,
              "num_objects_unfound": 0,
              "num_objects_dirty": 0,
              "num_whiteouts": 0,
              "num_read": 0,
              "num_read_kb": 0,
              "num_write": 0,
              "num_write_kb": 0,
              "num_scrub_errors": 0,
              "num_shallow_scrub_errors": 0,
              "num_deep_scrub_errors": 0,
              "num_objects_recovered": 0,
              "num_bytes_recovered": 0,
              "num_keys_recovered": 0,
              "num_objects_omap": 0,
              "num_objects_hit_set_archive": 0},
          "stat_cat_sum": {},
          "up": [
                0],
          "acting": [
                0],
          "up_primary": 0,
          "acting_primary": 0},
      "empty": 1,
      "dne": 0,
      "incomplete": 0,
      "last_epoch_started": 0,
      "hit_set_history": { "current_last_update": "0'0",
          "current_last_stamp": "0.000000",
          "current_info": { "begin": "0.000000",
              "end": "0.000000",
              "version": "0'0"},
          "history": []}},
  "peer_info": [],
  "recovery_state": [
        { "name": "Started\/Primary\/Peering",
          "enter_time": "2015-05-18 22:35:38.461150",
          "past_intervals": [
                { "first": 1,
                  "last": 3,
                  "maybe_went_rw": 0,
                  "up": [],
                  "acting": [
                        -1,
                        -1]}],
          "probing_osds": [
                "0"],
          "down_osds_we_would_probe": [],
          "peering_blocked_by": []},
        { "name": "Started",
          "enter_time": "2015-05-18 22:35:38.461070"}],
  "agent_state": {}}

Ceph pg dump
?.
..
.
0'0 2015-05-18 22:35:38.469318
2.2c 0 0 0 0 0 0 0 incomplete 2015-05-18 22:35:43.268681 0'0 11:10 [0] 0 [0] 0 0'0 2015-05-18 22:35:43.268216 0'0 2015-05-18 22:35:43.268216
1.2f 0 0 0 0 0 0 0 incomplete 2015-05-18 22:35:40.405908 0'0 11:10 [0] 0 [0] 0 0'0 2015-05-18 22:35:40.405527 0'0 2015-05-18 22:35:40.405527
0.2e 0 0 0 0 0 0 0 incomplete 2015-05-18 22:35:38.469270 0'0 11:10 [0] 0 [0] 0 0'0 2015-05-18 22:35:38.468833 0'0 2015-05-18 22:35:38.468833
pool 0 0 0 0 0 0 0 0
pool 1 0 0 0 0 0 0 0
pool 2 0 0 0 0 0 0 0
 sum 0 0 0 0 0 0 0
osdstat kbused kbavail kb hb in hb out
0 34704 5196892 5231596 [] []
1 33452 5198144 5231596 [0] []
2 33452 5198144 5231596 [0,1] []
 sum 101608 15593180 15694788

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20150526/4fd141d3/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: setup_ceph.sh
Type: application/octet-stream
Size: 4026 bytes
Desc: setup_ceph.sh
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20150526/4fd141d3/attachment.obj>