Hello everyone, i am testing ceph-deploy on Centos 6.3 i am getting errors. i have a simple one node setup as follows: OS: CentOS 6.3 kernel 3.5 and also kernel 2.6.32-279.el6.x86_64 Journal partition size=2GB /dev/sdb label=gpt selinux=OFF iptables=OFF NUMBER OF OSD=2 Test 1: ceph-deploy new gclient158 ceph-deploy mon create gclient158 ceph-deploy disk zap gclient158:/dev/sdc ceph-deploy disk zap gclient158:/dev/sdd ceph-deploy gatherkeys gclient158 ceph-deploy mds create gclient158 ceph-deploy osd prepare gclient158:sdc:/dev/sdb1 ceph-deploy osd prepare gclient158:sdd:/dev/sdb2 ceph-deploy osd activate gclient158:/dev/sdc:/dev/sdb1 ceph-deploy osd activate gclient158:/dev/sdd:/dev/sdb2 The result of the above ceph-deploy commands are shown below. The 2 osd are running but "ceph health" command nnnever show HEALTK_OK. It stays in HEALTH_WARN forever and is degraded. By the way /var/log/ceph/ceph-osd.0.log and /var/log/ceph/ceph-osd.1.log contain no real errors.This behavior is the same for kernel 3.5 and 2.6.32-279.el6.x86_64. What am i missing? [root@gclient158 ~]# ps -elf|grep ceph 5 S root 3124 1 0 80 0 - 40727 futex_ 10:49 ? 00:00:00 /usr/bin/ceph-mon -i gclient158 --pid-file /var/run/ceph/mon.gclient158.pid -c /etc/ceph/ceph.conf 5 S root 3472 1 0 80 0 - 41194 futex_ 10:49 ? 00:00:00 /usr/bin/ceph-mds -i gclient158 --pid-file /var/run/ceph/mds.gclient158.pid -c /etc/ceph/ceph.conf 5 S root 4035 1 1 78 -2 - 115119 futex_ 10:50 ? 00:00:00 /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf 5 S root 4769 1 0 78 -2 - 112304 futex_ 10:50 ? 00:00:00 /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf 0 S root 5025 2710 0 80 0 - 25811 pipe_w 10:50 pts/0 00:00:00 grep ceph [root@gclient158 ~]# ceph osd tree # id weight type name up/down reweight -1 0.14 root default -2 0.14 host gclient158 0 0.06999 osd.0 up 1 1 0.06999 osd.1 up 1 [root@gclient158 ~]# ceph health HEALTH_WARN 91 pgs degraded; 192 pgs stuck unclean; recovery 9/42 degraded (21.429%); recovering 2 o/s, 1492B/s [root@gclient158 ~]# ceph health HEALTH_WARN 91 pgs degraded; 192 pgs stuck unclean; recovery 9/42 degraded (21.429%); recovering 2 o/s, 1492B/s [root@gclient158 ~]# ceph health HEALTH_WARN 91 pgs degraded; 192 pgs stuck unclean; recovery 9/42 degraded (21.429%); recovering 2 o/s, 1492B/s By the way /var/log/ceph/ceph-osd.0.log and /var/log/ceph/ceph-osd.1.log contain no real errors but one thing that happens after osd prepare and activate commands is the error below: Traceback (most recent call last): File "/usr/sbin/ceph-deploy", line 8, in <module> load_entry_point('ceph-deploy==0.1', 'console_scripts', 'ceph-deploy')() File "/root/ceph-deploy/ceph_deploy/cli.py", line 112, in main return args.func(args) File "/root/ceph-deploy/ceph_deploy/osd.py", line 426, in osd prepare(args, cfg, activate_prepared_disk=False) File "/root/ceph-deploy/ceph_deploy/osd.py", line 273, in prepare s = '{} returned {}\n{}\n{}'.format(cmd, ret, out, err) ValueError: zero length field name in format The above error probably has to with the journal device. I had the same error when the journal device label=gpt and also with journal device label=msdos. Please, what am i missing here and why does the cluster never reaches HEALTH_OK? TEST 2: the setup for this test is same as above except i used same disk for both ceph data and journal as follows: ceph-deploy osd prepare gclient158:/dev/sdc ceph-deploy osd prepare gclient158:/dev/sdd ceph-deploy osd activate gclient158:/dev/sdc ceph-deploy osd activate gclient158:/dev/sdd For test 2, i do not get the error in test 1 but osd's fail to start and both osd log files contain this error: 2013-05-21 11:54:24.806747 7f26cfa26780 -1 journal check: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected 942af534-ccc0-4843-8598-79420592317a, invalid (someone else's?) journal 2013-05-21 11:54:24.806784 7f26cfa26780 -1 filestore(/var/lib/ceph/tmp/mnt.3YsEmH) mkjournal error creating journal on /var/lib/ceph/tmp/mnt.3YsEmH/journal: (22) Invalid argument 2013-05-21 11:54:24.806802 7f26cfa26780 -1 OSD::mkfs: FileStore::mkfs failed with error -22 2013-05-21 11:54:24.806838 7f26cfa26780 -1 ^[[0;31m ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.3YsEmH: (22) Invalid argument^[[0m What am i missing? any suggestion on both test cases would be appreciated. Thank you. Isaac -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html