FW: OSD deployed with ceph directories but not using Cinder volumes

scarvalhojr@xxxxxxxxx (Sergio A. de Carvalho Jr.) · Wed, 27 May 2015 15:40:24 +0100

I was under the impression that "ceph-disk activate" would take care of
setting OSD weights. In fact, the documentation for adding OSDs, the "short
form", only talks about running ceph-disk prepare and activate:

http://ceph.com/docs/master/install/manual-deployment/#adding-osds

This is also how the Ceph cookbook provisions OSDs (
https://github.com/ceph/ceph-cookbook) and we've been using it successfully
in other scenarios, without having to manually set weights.

Sergio

On Wed, May 27, 2015 at 2:09 AM, Christian Balzer <chibi at gol.com> wrote:

>
> Hello,
>
> your problem is of course that the weight is 0 for all your OSDs.
> Thus no data can be placed anywhere at all.
>
> You will want to re-read the manual deployment documentation or dissect
> ceph-deploy/ceph-disk more.
> Your script misses the crush add bit of that process:
> ceph osd crush add {id-or-name} {weight}  [{bucket-type}={bucket-name} ...]
>
> like
> "ceph osd crush add osd.0 1 host=host01"
>
> Christian
>
> On Tue, 26 May 2015 17:29:52 +0000 Johanni Thunstrom wrote:
>
> > Dear Ceph Team,
> >
> > Our cluster includes three Ceph nodes with 1 MON and 1 OSD in each. All
> > nodes are running on CentOS 6.5 (kernel 2.6.32) VMs in a testing
> > cluster, not production. The script we?re using is a simplified sequence
> > of steps that does more or less what the ceph-cookbook does. Using
> > OpenStack Cinder, we have attached a 10G block volume to each node in
> > order to setup the OSD. After running our ceph cluster initialization
> > script (pasted below), our cluster has a status of HEALTH_WARN and PG
> > status of incomplete. Additionally all PGs in every Ceph node have the
> > same acting and up set: [0]. Is this an indicator that the PG?s have not
> > even started the creating state, since not every OSD has the id 0 yet
> > they all state 0 as their up and acting OSD? Additionally the weight of
> > all OSD?s is 0. Overall, the OSD?s appear to be up and in. The network
> > appears to be fine; we are able to ping & telnet to each server from one
> > another.
> >
> > In order to isolate our problem, we tried replacing the attached cinder
> > volume for a  10G xfs formatted file mounted to /ceph-data. We set
> > OSD_PATH=/ceph-data and JOURNAL_PATH=/ceph-data/journal, and kept the
> > rest of our setup_ceph.sh script the same. Our ceph cluster was able to
> > reach a status of HEALTH_OK and all PGs were active+clean.
> >
> > What seems to be missing is the communication between the OSDs to
> > replicate/create the PGs correctly. Any advice on what?s blocking the
> > PGs from reaching an active+clean state? We are very stumped as to why
> > the cluster using an attached cinder volume fails to reach HEALTH_OK.
> >
> > If I left out any important information or explanation on how the ceph
> > cluster was created, let me know. Thank you!
> >
> > Sincerely,
> > Johanni B. Thunstrom
> >
> > Health Output:
> >
> > ceph ?s
> >     cluster cbbcfd09-9e8e-4cd1-905f-4b8e0fdb48cf
> >      health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192
> > pgs stuck unclean monmap e3: 3 mons at
> > {cephscriptdeplcindervol01=
> 10.98.66.235:6789/0,cephscriptdeplcindervol02=10.98.66.229:6789/0,cephscriptdeplcindervol03=10.98.66.226:6789/0
> },
> > election epoch 6, quorum 0,1,2
> >
> cephscriptdeplcindervol03,cephscriptdeplcindervol02,cephscriptdeplcindervol01
> > osdmap e11: 3 osds: 3 up, 3 in pgmap v23: 192 pgs, 3 pools, 0 bytes
> > data, 0 objects 101608 kB used, 15227 MB / 15326 MB avail 192 incomplete
> >
> > ceph health detail
> > HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck
> > unclean pg 1.2c is stuck inactive since forever, current state
> > incomplete, last acting [0] pg 0.2d is stuck inactive since forever,
> > current state incomplete, last acting [0] ..
> > ?
> > ..
> > pg 0.2e is stuck unclean since forever, current state incomplete, last
> > acting [0] pg 1.2f is stuck unclean since forever, current state
> > incomplete, last acting [0] pg 2.2c is stuck unclean since forever,
> > current state incomplete, last acting [0] pg 2.2f is incomplete, acting
> > [0] (reducing pool rbd min_size from 2 may help; search ceph.com/docs
> > for 'incomplete') .. ?.
> > ..
> > pg 1.30 is incomplete, acting [0] (reducing pool metadata min_size from
> > 2 may help; search ceph.com/docs for 'incomplete') pg 0.31 is
> > incomplete, acting [0] (reducing pool data min_size from 2 may help;
> > search ceph.com/docs for 'incomplete') pg 2.32 is incomplete, acting [0]
> > (reducing pool rbd min_size from 2 may help; search ceph.com/docs for
> > 'incomplete') pg 1.31 is incomplete, acting [0] (reducing pool metadata
> > min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 0.30
> > is incomplete, acting [0] (reducing pool data min_size from 2 may help;
> > search ceph.com/docs for 'incomplete') pg 2.2d is incomplete, acting [0]
> > (reducing pool rbd min_size from 2 may help; search ceph.com/docs for
> > 'incomplete') pg 1.2e is incomplete, acting [0] (reducing pool metadata
> > min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 0.2f
> > is incomplete, acting [0] (reducing pool data min_size from 2 may help;
> > search ceph.com/docs for 'incomplete') pg 2.2c is incomplete, acting [0]
> > (reducing pool rbd min_size from 2 may help; search ceph.com/docs for
> > 'incomplete') pg 1.2f is incomplete, acting [0] (reducing pool metadata
> > min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 0.2e
> > is incomplete, acting [0] (reducing pool data min_size from 2 may help;
> > search ceph.com/docs for 'incomplete')
> >
> > ceph mon dump
> > dumped monmap epoch 3
> > epoch 3
> > fsid cbbcfd09-9e8e-4cd1-905f-4b8e0fdb48cf
> > last_changed 2015-05-18 23:10:39.218552
> > created 0.000000
> > 0: 10.98.66.226:6789/0 mon.cephscriptdeplcindervol03
> > 1: 10.98.66.229:6789/0 mon.cephscriptdeplcindervol02
> > 2: 10.98.66.235:6789/0 mon.cephscriptdeplcindervol01
> >
> > ceph osd dump
> > epoch 11
> > fsid cbbcfd09-9e8e-4cd1-905f-4b8e0fdb48cf
> > created 2015-05-18 22:35:14.823379
> > modified 2015-05-18 23:10:59.037467
> > flags
> > pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash
> > rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool
> > crash_replay_interval 45 stripe_width 0 pool 1 'metadata' replicated
> > size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num
> > 64 last_change 1 flags hashpspool stripe_width 0 pool 2 'rbd' replicated
> > size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num
> > 64 last_change 1 flags hashpspool stripe_width 0 max_osd 3 osd.0 up
> > in  weight 1 up_from 4 up_thru 5 down_at 0 last_clean_interval [0,0)
> > 10.98.66.235:6800/3959 10.98.66.235:6801/3959 10.98.66.235:6802/3959
> > 10.98.66.235:6803/3959 exists,up 71c866d3-2163-4574-a0aa-a4e0fa8c3569
> > osd.1 up   in  weight 1 up_from 8 up_thru 0 down_at 0
> > last_clean_interval [0,0) 10.98.66.229:6800/4137 10.98.66.229:6801/4137
> > 10.98.66.229:6802/4137 10.98.66.229:6803/4137 exists,up
> > 1ee644fc-3fc7-4f3b-9e5b-96ba6a8afb99 osd.2 up   in  weight 1 up_from 11
> > up_thru 0 down_at 0 last_clean_interval [0,0) 10.98.66.226:6800/4139
> > 10.98.66.226:6801/4139 10.98.66.226:6802/4139 10.98.66.226:6803/4139
> > exists,up 6bee9a39-b909-483f-a5a0-ed4e1b016638
> >
> > ceph osd tree
> > # id weighttype name up/downreweight
> > -1 0root default
> > -2 0host cephscriptdeplcindervol01
> > 0 0osd.0 up1
> > -3 0host cephscriptdeplcindervol02
> > 1 0osd.1 up1
> > -4 0host cephscriptdeplcindervol03
> > 2 0osd.2 up1
> >
> > *on second ceph node
> > ceph pg map 0.1f
> > osdmap e11 pg 0.1f (0.1f) -> up [0] acting [0]
> >
> > *on first (bootstrap) ceph node
> > ceph pg map 0.1f
> > osdmap e11 pg 0.1f (0.1f) -> up [0] acting [0]
> >
> > ceph pg 0.1f query
> > { "state": "incomplete",
> >   "epoch": 11,
> >   "up": [
> >         0],
> >   "acting": [
> >         0],
> >   "info": { "pgid": "0.1f",
> >       "last_update": "0'0",
> >       "last_complete": "0'0",
> >       "log_tail": "0'0",
> >       "last_user_version": 0,
> >       "last_backfill": "MAX",
> >       "purged_snaps": "[]",
> >       "history": { "epoch_created": 1,
> >           "last_epoch_started": 0,
> >           "last_epoch_clean": 1,
> >           "last_epoch_split": 0,
> >           "same_up_since": 4,
> >           "same_interval_since": 4,
> >           "same_primary_since": 4,
> >           "last_scrub": "0'0",
> >           "last_scrub_stamp": "2015-05-18 22:35:38.460878",
> >           "last_deep_scrub": "0'0",
> >           "last_deep_scrub_stamp": "2015-05-18 22:35:38.460878",
> >           "last_clean_scrub_stamp": "0.000000"},
> >       "stats": { "version": "0'0",
> >           "reported_seq": "10",
> >           "reported_epoch": "11",
> >           "state": "incomplete",
> >           "last_fresh": "2015-05-18 23:10:59.047056",
> >           "last_change": "2015-05-18 22:35:38.461314",
> >           "last_active": "0.000000",
> >           "last_clean": "0.000000",
> >           "last_became_active": "0.000000",
> >           "last_unstale": "2015-05-18 23:10:59.047056",
> >           "mapping_epoch": 4,
> >           "log_start": "0'0",
> >           "ondisk_log_start": "0'0",
> >           "created": 1,
> >           "last_epoch_clean": 1,
> >           "parent": "0.0",
> >           "parent_split_bits": 0,
> >           "last_scrub": "0'0",
> >           "last_scrub_stamp": "2015-05-18 22:35:38.460878",
> >           "last_deep_scrub": "0'0",
> >           "last_deep_scrub_stamp": "2015-05-18 22:35:38.460878",
> >           "last_clean_scrub_stamp": "0.000000",
> >           "log_size": 0,
> >           "ondisk_log_size": 0,
> >           "stats_invalid": "0",
> >           "stat_sum": { "num_bytes": 0,
> >               "num_objects": 0,
> >               "num_object_clones": 0,
> >               "num_object_copies": 0,
> >               "num_objects_missing_on_primary": 0,
> >               "num_objects_degraded": 0,
> >               "num_objects_unfound": 0,
> >               "num_objects_dirty": 0,
> >               "num_whiteouts": 0,
> >               "num_read": 0,
> >               "num_read_kb": 0,
> >               "num_write": 0,
> >               "num_write_kb": 0,
> >               "num_scrub_errors": 0,
> >               "num_shallow_scrub_errors": 0,
> >               "num_deep_scrub_errors": 0,
> >               "num_objects_recovered": 0,
> >               "num_bytes_recovered": 0,
> >               "num_keys_recovered": 0,
> >               "num_objects_omap": 0,
> >               "num_objects_hit_set_archive": 0},
> >           "stat_cat_sum": {},
> >           "up": [
> >                 0],
> >           "acting": [
> >                 0],
> >           "up_primary": 0,
> >           "acting_primary": 0},
> >       "empty": 1,
> >       "dne": 0,
> >       "incomplete": 0,
> >       "last_epoch_started": 0,
> >       "hit_set_history": { "current_last_update": "0'0",
> >           "current_last_stamp": "0.000000",
> >           "current_info": { "begin": "0.000000",
> >               "end": "0.000000",
> >               "version": "0'0"},
> >           "history": []}},
> >   "peer_info": [],
> >   "recovery_state": [
> >         { "name": "Started\/Primary\/Peering",
> >           "enter_time": "2015-05-18 22:35:38.461150",
> >           "past_intervals": [
> >                 { "first": 1,
> >                   "last": 3,
> >                   "maybe_went_rw": 0,
> >                   "up": [],
> >                   "acting": [
> >                         -1,
> >                         -1]}],
> >           "probing_osds": [
> >                 "0"],
> >           "down_osds_we_would_probe": [],
> >           "peering_blocked_by": []},
> >         { "name": "Started",
> >           "enter_time": "2015-05-18 22:35:38.461070"}],
> >   "agent_state": {}}
> >
> > Ceph pg dump
> > ?.
> > ..
> > .
> > 0'0 2015-05-18 22:35:38.469318
> > 2.2c 00 00 00 0incomplete 2015-05-18 22:35:43.2686810'0 11:10[0] 0[0]
> > 00'0 2015-05-18 22:35:43.2682160'0 2015-05-18 22:35:43.268216 1.2f 00 00
> > 00 0incomplete 2015-05-18 22:35:40.4059080'0 11:10[0] 0[0] 00'0
> > 2015-05-18 22:35:40.4055270'0 2015-05-18 22:35:40.405527 0.2e 00 00 00
> > 0incomplete 2015-05-18 22:35:38.4692700'0 11:10[0] 0[0] 00'0 2015-05-18
> > 22:35:38.4688330'0 2015-05-18 22:35:38.468833 pool 0 00 00 00 0 pool 1
> > 00 00 00 0 pool 2 00 00 00 0 sum 00 00 00 0
> > osdstat kbusedkbavail kbhb in hb out
> > 0 347045196892 5231596[] []
> > 1 334525198144 5231596[0] []
> > 2 334525198144 5231596[0,1] []
> >  sum 10160815593180 15694788
> >
> > ===========================
> >
> >
> > $ cat setup_ceph.sh
> >
> > #!/bin/bash
> >
> > #
> >
> ------------------------------------------------------------------------------
> >
> > # This script sets up a Ceph node as part of a Ceph cluster.
> >
> > #
> >
> > # This is part of an experiment to ensure we run a Ceph cluster on Docker
> >
> > # containers as well as use this cluster as a back-end storage for
> > OpenStack
> >
> > # services that are also running on containers. A fully-automated
> > deployment
> >
> > # will later be implemented using Chef cookbooks.
> >
> > #
> >
> ------------------------------------------------------------------------------
> >
> > set -e
> >
> > set -x
> >
> > if [ "$1" == "-h" ] || [ "$1" == "--help" ]; then
> >
> > cat << END_USAGE_INFO
> >
> > Usage: $0 [ <initial monitors> [ <bootstrap IP> [ <monitor secret>
> > [ <fsid> [ <monitor list> [ <data path> [ <journal path> ]]]]]]]
> >
> > Where: initial monitors - comma-separated list of IDs of monitors allowed
> >
> > to start the cluster (default: <this host's name>)
> >
> > bootstrap IP     - IP address of any other monitor that is already
> >
> > part of the cluster (default: none)
> >
> > monitor secret   - monitor secret (randomly generated if not given)
> >
> > fsid             - cluster ID (randomly generated if not given)
> >
> > monitor list     - comma-separated list of FQDN of known monitors
> >
> > (default: <this host's FQDN>)
> >
> > data path        - path to OSD data device or directory
> >
> > (default: /ceph-data/osd)
> >
> > journal path     - path to OSD journal device or file
> >
> > (default: /ceph-data/journal)
> >
> > END_USAGE_INFO
> >
> > exit 0
> >
> > fi
> >
> >
> > if [ "$(id -u)" != "0" ]; then
> >
> > echo "This script must be run as root."
> >
> > exit 1
> >
> > fi
> >
> >
> > INITIAL_MONS=$1
> >
> > BOOTSTRAP_IP=$2
> >
> > MON_SECRET=$3
> >
> > FSID=$4
> >
> > MON_LIST=$5
> >
> > OSD_PATH=$6
> >
> > JOURNAL_PATH=$7
> >
> > CLUSTER_NAME="ceph"
> >
> >
> > THIS_HOST_FQDN=$(hostname -f)
> >
> > THIS_HOST_NAME=$(hostname -s)
> >
> > THIS_HOST_IP=$(hostname -i)
> >
> >
> > yum install -y ceph xfsprogs
> >
> > if [ -z "${INITIAL_MONS}" ]; then
> >
> > INITIAL_MONS=${THIS_HOST_NAME}
> >
> > fi
> >
> > if [ -z "${MON_SECRET}" ]; then
> >
> > MON_SECRET=$(ceph-authtool /dev/stdout --name=mon. --gen-key | awk -F
> > "key = " '/key/{print $2}')
> >
> > fi
> >
> > if [ -z "${FSID}" ]; then
> >
> > FSID=$(uuidgen)
> >
> > NEW_CLUSTER=true
> >
> > else
> >
> > NEW_CLUSTER=false
> >
> > fi
> >
> > if [ -z "${MON_LIST}" ]; then
> >
> > MON_LIST=${THIS_HOST_FQDN}
> >
> > fi
> >
> > if [ -z "${OSD_PATH}" ]; then
> >
> > OSD_PATH="/ceph-data/osd"
> >
> > rm -rf ${OSD_PATH}
> >
> > mkdir -p ${OSD_PATH}
> >
> > fi
> >
> > if [ -z "${JOURNAL_PATH}" ]; then
> >
> > JOURNAL_PATH="/ceph-data/journal"
> >
> > rm -f ${JOURNAL_PATH}
> >
> > fi
> >
> >
> > cat > /etc/ceph/ceph.conf << END_CEPH_CONF
> >
> > [global]
> >
> > fsid = ${FSID}
> >
> > mon initial members = ${INITIAL_MONS}
> >
> > mon host = ${MON_LIST}
> >
> > END_CEPH_CONF
> >
> >
> > mkdir -p /var/run/ceph
> >
> > chmod 755 /var/run/ceph
> >
> > mkdir -p /var/lib/ceph/mon/${CLUSTER_NAME}-${THIS_HOST_NAME}
> >
> > chmod 755 /var/lib/ceph/mon/${CLUSTER_NAME}-${THIS_HOST_NAME}
> >
> > TMP_KEY=/tmp/${CLUSTER_NAME}-${THIS_HOST_NAME}.mon.keyring
> >
> > ceph-authtool ${TMP_KEY} --create-keyring --name=mon.
> > --add-key=${MON_SECRET} --cap mon 'allow *'
> >
> > if $NEW_CLUSTER ; then
> >
> > ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring
> > --gen-key -n client.admin --set-uid=0 --cap mon 'allow *' --cap osd
> > 'allow *' --cap mds 'allow'
> >
> > ceph-authtool ${TMP_KEY}
> > --import-keyring /etc/ceph/ceph.client.admin.keyring
> >
> > fi
> >
> > ceph-mon --mkfs -i ${THIS_HOST_NAME} --keyring ${TMP_KEY}
> >
> > rm -f ${TMP_KEY}
> >
> > touch /var/lib/ceph/mon/${CLUSTER_NAME}-${THIS_HOST_NAME}/{done,sysvinit}
> >
> > /etc/init.d/ceph start mon.${THIS_HOST_NAME}
> >
> > if [ ! -z "${BOOTSTRAP_IP}" ]; then
> >
> > ceph --admin-daemon /var/run/ceph/ceph-mon.${THIS_HOST_NAME}.asok
> > add_bootstrap_peer_hint ${BOOTSTRAP_IP}
> >
> > else
> >
> > ceph --admin-daemon /var/run/ceph/ceph-mon.${THIS_HOST_NAME}.asok
> > add_bootstrap_peer_hint ${THIS_HOST_IP}
> >
> > fi
> >
> >
> > /usr/sbin/ceph-disk -v prepare ${OSD_PATH} ${JOURNAL_PATH}
> >
> > sleep 20
> >
> >
> > # call to "ceph-disk activate," is not necessary since udev will trigger
> > activation once the disk is prepared.
> >
> > # /usr/sbin/ceph-disk -v activate ${OSD_PATH}
> >
> >
> > set +x
> >
> > cat << INSTALL_END
> >
> > Installation completed successfully
> >
> > Monitor secret: ${MON_SECRET}
> >
> > Cluster FSID: ${FSID}
> >
> > This node name: ${THIS_HOST_NAME}
> >
> > This node IP address: ${THIS_HOST_IP}
> >
> > INSTALL_END
> >
> > exit 0
> >
> >
>
>
> --
> Christian Balzer        Network/Systems Engineer
> chibi at gol.com           Global OnLine Japan/Fusion Communications
> http://www.gol.com/
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20150527/44c87a88/attachment.htm>