ceph-deploy errors on CentOS

Isaac Otsiabah <zmoo76b@xxxxxxxxx> · Tue, 21 May 2013 13:21:14 -0700 (PDT)

Hello everyone, i am testing ceph-deploy on Centos 6.3 i am getting errors. i have a simple one node setup as follows:

OS: CentOS 6.3
kernel 3.5  and also kernel 2.6.32-279.el6.x86_64
Journal partition size=2GB
/dev/sdb label=gpt
selinux=OFF
iptables=OFF
NUMBER OF OSD=2

Test 1:
ceph-deploy new gclient158
ceph-deploy mon create gclient158
ceph-deploy disk zap gclient158:/dev/sdc
ceph-deploy disk zap gclient158:/dev/sdd

ceph-deploy gatherkeys gclient158

ceph-deploy mds create gclient158

ceph-deploy osd prepare gclient158:sdc:/dev/sdb1

ceph-deploy osd prepare gclient158:sdd:/dev/sdb2

ceph-deploy osd activate gclient158:/dev/sdc:/dev/sdb1

ceph-deploy osd activate gclient158:/dev/sdd:/dev/sdb2

The result of the above ceph-deploy commands are shown below. The 2 osd are running but "ceph health" command nnnever show HEALTK_OK. It stays in HEALTH_WARN forever and is degraded. By the way /var/log/ceph/ceph-osd.0.log and /var/log/ceph/ceph-osd.1.log contain no real errors.This behavior is the same for kernel 3.5 and 2.6.32-279.el6.x86_64. What am i missing?

[root@gclient158 ~]# ps -elf|grep ceph
5 S root      3124     1  0  80   0 - 40727 futex_ 10:49 ?        00:00:00 /usr/bin/ceph-mon -i gclient158 --pid-file /var/run/ceph/mon.gclient158.pid -c /etc/ceph/ceph.conf
5 S root      3472     1  0  80   0 - 41194 futex_ 10:49 ?        00:00:00 /usr/bin/ceph-mds -i gclient158 --pid-file /var/run/ceph/mds.gclient158.pid -c /etc/ceph/ceph.conf
5 S root      4035     1  1  78  -2 - 115119 futex_ 10:50 ?       00:00:00 /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
5 S root      4769     1  0  78  -2 - 112304 futex_ 10:50 ?       00:00:00 /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf
0 S root      5025  2710  0  80   0 - 25811 pipe_w 10:50 pts/0    00:00:00 grep ceph

[root@gclient158 ~]# ceph osd tree

# id    weight  type name       up/down reweight
-1      0.14    root default
-2      0.14            host gclient158
0       0.06999                 osd.0   up      1
1       0.06999                 osd.1   up      1

[root@gclient158 ~]# ceph health
HEALTH_WARN 91 pgs degraded; 192 pgs stuck unclean; recovery 9/42 degraded (21.429%);  recovering 2 o/s, 1492B/s
[root@gclient158 ~]# ceph health
HEALTH_WARN 91 pgs degraded; 192 pgs stuck unclean; recovery 9/42 degraded (21.429%);  recovering 2 o/s, 1492B/s
[root@gclient158 ~]# ceph health
HEALTH_WARN 91 pgs degraded; 192 pgs stuck unclean; recovery 9/42 degraded (21.429%);  recovering 2 o/s, 1492B/s

By the way /var/log/ceph/ceph-osd.0.log and /var/log/ceph/ceph-osd.1.log contain no real errors but one thing that happens after osd prepare and activate commands is the error below:

Traceback (most recent call last):
  File "/usr/sbin/ceph-deploy", line 8, in <module>
    load_entry_point('ceph-deploy==0.1', 'console_scripts', 'ceph-deploy')()
  File "/root/ceph-deploy/ceph_deploy/cli.py", line 112, in main
    return args.func(args)
  File "/root/ceph-deploy/ceph_deploy/osd.py", line 426, in osd
    prepare(args, cfg, activate_prepared_disk=False)
  File "/root/ceph-deploy/ceph_deploy/osd.py", line 273, in prepare
    s = '{} returned {}\n{}\n{}'.format(cmd, ret, out, err)
ValueError: zero length field name in format

The above error probably has to with the journal device. I had the same error when the journal device label=gpt and also with journal device label=msdos. Please, what am i missing here and why does the cluster never reaches HEALTH_OK?

TEST 2:  the setup for this test is same as above except i used same disk for both ceph data and journal as follows:

ceph-deploy osd prepare gclient158:/dev/sdc
ceph-deploy osd prepare gclient158:/dev/sdd
ceph-deploy osd activate gclient158:/dev/sdc
ceph-deploy osd activate gclient158:/dev/sdd

For test 2, i do not get the error in test 1 but osd's fail to start and both osd log files contain this error:

2013-05-21 11:54:24.806747 7f26cfa26780 -1 journal check: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected 942af534-ccc0-4843-8598-79420592317a, invalid (someone else's?) journal
2013-05-21 11:54:24.806784 7f26cfa26780 -1 filestore(/var/lib/ceph/tmp/mnt.3YsEmH) mkjournal error creating journal on /var/lib/ceph/tmp/mnt.3YsEmH/journal: (22) Invalid argument
2013-05-21 11:54:24.806802 7f26cfa26780 -1 OSD::mkfs: FileStore::mkfs failed with error -22
2013-05-21 11:54:24.806838 7f26cfa26780 -1 ^[[0;31m ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.3YsEmH: (22) Invalid argument^[[0m

What am i missing? any suggestion on both test cases would be appreciated. Thank you.

Isaac
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html