Re: osd down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Pavel,

It looks like you have deployed your 2 OSDs on the same host. By default, in the CRUSH map, each object is going to be assigned ti 2 OSDs that are on different host.

If you want this to work for testing, you’ll have to adapt your CRUSH map so that each copy is dispatch on a bucket of type ‘odd’ and not host.

Being unable to find a candidate OSD according to the current CRUSH map you have iOS probably why your PGs remain stuck and inactive. I reproduced your setup and got the same result but as soon as I modified the map, all the PGs came up active+clean

mtc
JC



On Feb 16, 2014, at 14:02, Pavel V. Kaygorodov <pasha@xxxxxxxxx> wrote:

Hi!

I have tried, but situation not changed significantly:

# ceph -w
    cluster e90dfd37-98d1-45bb-a847-8590a5ed8e71
     health HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean; 2/2 in osds are down
     monmap e1: 1 mons at {host1=172.17.0.4:6789/0}, election epoch 1, quorum 0 host1
     osdmap e9: 2 osds: 0 up, 2 in
      pgmap v10: 192 pgs, 3 pools, 0 bytes data, 0 objects
            0 kB used, 0 kB / 0 kB avail
                 192 creating
2014-02-16 17:25:29.872538 mon.0 [INF] osdmap e9: 2 osds: 0 up, 2 in

# ceph osd tree
# id    weight  type name       up/down reweight
-1      2       root default
-2      2               host host1
0       1                       osd.0   down    1
1       1                       osd.1   down    1

# ceph health
HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean; 2/2 in osds are down

ps showed both osd daemons running.

Pavel.

17 февр. 2014 г., в 1:50, Karan Singh <karan.singh@xxxxxx> написал(а):

Hi Pavel

  • Try to add at least 1 more OSD ( bare minimum ) and set pool replication to 2 after that.
  • For osd.0  try  ,   # ceph osd in osd.0   , once the osd is IN , try to bring up osd.0 services up 


Finally your both the OSD should be  IN  and UP , so that your cluster can store data.

Regards
Karan


On 16 Feb 2014, at 20:06, Pavel V. Kaygorodov <pasha@xxxxxxxxx> wrote:

Hi, All!

I am trying to setup ceph from scratch, without dedicated drive, with one mon and one osd.
After all, I see following output of ceph osd tree:

# id    weight  type name       up/down reweight
-1      1       root default
-2      1               host host1
0       1                       osd.0   down    0

ceph -w:

   cluster e90dfd37-98d1-45bb-a847-8590a5ed8e71
    health HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean
    monmap e1: 1 mons at {host1=172.17.0.4:6789/0}, election epoch 1, quorum 0 host1
    osdmap e5: 1 osds: 0 up, 0 in
     pgmap v6: 192 pgs, 3 pools, 0 bytes data, 0 objects
           0 kB used, 0 kB / 0 kB avail
                192 creating

2014-02-16 13:27:30.095938 mon.0 [INF] osdmap e5: 1 osds: 0 up, 0 in

What can be wrong?
I see working daemons, and nothing bad in log files.

////////////////////////////////////

How to reproduce:
I have cloned and compiled sources on debian/jessie:

git clone --recursive -b v0.75 https://github.com/ceph/ceph.git
cd /ceph/ && ./autogen.sh && ./configure && make && make install

Everything seems ok.

I have created ceph.conf:

[global]

fsid = e90dfd37-98d1-45bb-a847-8590a5ed8e71
mon initial members = host1

auth cluster required = cephx
auth service required = cephx
auth client required = cephx

keyring = /data/ceph.client.admin.keyring

osd pool default size = 1
osd pool default min size = 1
osd pool default pg num = 333
osd pool default pgp num = 333
osd crush chooseleaf type = 0   
osd journal size = 1000

filestore xattr use omap = true

;journal dio = false
;journal aio = false

mon addr = ceph.dkctl
mon host = ceph.dkctl

log file = /data/logs/ceph.log

[mon]
mon data = "">keyring = /data/ceph.mon.keyring
log file = /data/logs/mon0.log

[osd.0]
osd host    = host1
osd data    = /data/osd0
osd journal = /data/osd0.journal
log file    = /data/logs/osd0.log
keyring     = /data/ceph.osd0.keyring

///////////////////////////////

I have initialized mon and osd using following script:

/usr/local/bin/ceph-authtool --create-keyring /data/ceph.mon.keyring --gen-key -n mon. --cap mon 'allow *'
/usr/local/bin/ceph-authtool --create-keyring /data/ceph.client.admin.keyring --gen-key -n client.admin --set-uid=0 --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow'
/usr/local/bin/ceph-authtool /data/ceph.mon.keyring --import-keyring /data/ceph.client.admin.keyring
/usr/local/bin/monmaptool --create --add host1 `grep ceph /etc/hosts | awk '{print $1}'` --fsid de90dfd37-98d1-45bb-a847-8590a5ed8e71 /data/monmap
/usr/local/bin/ceph-mon --mkfs -i host1 --monmap /data/monmap --keyring /data/ceph.mon.keyring
/usr/local/bin/ceph-mon -c /ceph.conf --public-addr `grep ceph /etc/hosts | awk '{print $1}'` -i host1
/usr/local/bin/ceph osd create e90dfd37-98d1-45bb-a847-8590a5ed8e71
/usr/local/bin/ceph osd create e90dfd37-98d1-45bb-a847-8590a5ed8e71
/usr/local/bin/ceph-osd -i 0 --mkfs --mkkey
/usr/local/bin/ceph auth add osd.0 osd 'allow *' mon 'allow rwx'  -i /data/ceph.osd0.keyring
/usr/local/bin/ceph osd crush add-bucket host1 host
/usr/local/bin/ceph osd crush move host1 root=default
/usr/local/bin/ceph osd crush add osd.0 1.0 host=host1

////////////////////////////////////////////////////

Script ouptut seems to be ok:

creating /data/ceph.mon.keyring
creating /data/ceph.client.admin.keyring
importing contents of /data/ceph.client.admin.keyring into /data/ceph.mon.keyring
/usr/local/bin/monmaptool: monmap file /data/monmap
/usr/local/bin/monmaptool: set fsid to e90dfd37-98d1-45bb-a847-8590a5ed8e71
/usr/local/bin/monmaptool: writing epoch 0 to /data/monmap (1 monitors)
/usr/local/bin/ceph-mon: set fsid to e90dfd37-98d1-45bb-a847-8590a5ed8e71
/usr/local/bin/ceph-mon: created monfs at /data/mon0 for mon.host1
0
2014-02-16 13:24:37.833469 7f5ef61747c0 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2014-02-16 13:24:37.941111 7f5ef61747c0 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2014-02-16 13:24:37.948704 7f5ef61747c0 -1 filestore(/data/osd0) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
2014-02-16 13:24:38.054345 7f5ef61747c0 -1 created object store /data/osd0 journal /data/osd0.journal for osd.0 fsid e90dfd37-98d1-45bb-a847-8590a5ed8e71
2014-02-16 13:24:38.054427 7f5ef61747c0 -1 auth: error reading file: /data/ceph.osd0.keyring: can't open /data/ceph.osd0.keyring: (2) No such file or directory
2014-02-16 13:24:38.054529 7f5ef61747c0 -1 created new key in keyring /data/ceph.osd0.keyring
added key for osd.0
added bucket host1 type host to crush map
moved item id -2 name 'host1' to location {root=default} in crush map
add item id 0 name 'osd.0' weight 1 at location {host=host1} to crush map

///////////////////////////////////////////////////////////////////////

I am started daemons, using commands:

/usr/local/bin/ceph-mon -c /ceph.conf --public-addr `grep ceph /etc/hosts | awk '{print $1}'` -i host1 --debug_ms 1000
/usr/local/bin/ceph-osd -c /ceph.conf -i 0 --debug_ms 1000

/////////////////////////////////////////////////////////////////////

Daemons are alive, log files show evidences of some communications between mon and osd:

mon0.log:

2014-02-16 13:27:30.399455 7f02699e0700 10 -- 172.17.0.4:6789/0 >> 172.17.0.4:6800/20 pipe(0x358a500 sd=21 :6789 s=2 pgs=1 cs=1 l=1 c=0x34d1a20).write_ack 8
2014-02-16 13:27:30.399463 7f02699e0700 10 -- 172.17.0.4:6789/0 >> 172.17.0.4:6800/20 pipe(0x358a500 sd=21 :6789 s=2 pgs=1 cs=1 l=1 c=0x34d1a20).writer: state = open policy.server=1
2014-02-16 13:27:30.399467 7f02699e0700 20 -- 172.17.0.4:6789/0 >> 172.17.0.4:6800/20 pipe(0x358a500 sd=21 :6789 s=2 pgs=1 cs=1 l=1 c=0x34d1a20).writer sleeping
2014-02-16 13:27:33.367087 7f02705a8700 20 -- 172.17.0.4:6789/0 >> 172.17.0.4:6800/20 pipe(0x358a500 sd=21 :6789 s=2 pgs=1 cs=1 l=1 c=0x34d1a20).reader got KEEPALIVE
2014-02-16 13:27:33.367133 7f02705a8700 20 -- 172.17.0.4:6789/0 >> 172.17.0.4:6800/20 pipe(0x358a500 sd=21 :6789 s=2 pgs=1 cs=1 l=1 c=0x34d1a20).reader reading tag...

osd0.log:

2014-02-16 13:27:53.368482 7f340bfb3700 20 -- 172.17.0.4:6800/20 send_keepalive con 0x2381a20, have pipe.
2014-02-16 13:27:53.368702 7f3420ef7700 10 -- 172.17.0.4:6800/20 >> 172.17.0.4:6789/0 pipe(0x2388000 sd=25 :38871 s=2 pgs=1 cs=1 l=1 c=0x2381a20).writer: state = open policy.server=0
2014-02-16 13:27:53.368806 7f3420ef7700 10 -- 172.17.0.4:6800/20 >> 172.17.0.4:6789/0 pipe(0x2388000 sd=25 :38871 s=2 pgs=1 cs=1 l=1 c=0x2381a20).write_keepalive
2014-02-16 13:27:53.369000 7f3420ef7700 10 -- 172.17.0.4:6800/20 >> 172.17.0.4:6789/0 pipe(0x2388000 sd=25 :38871 s=2 pgs=1 cs=1 l=1 c=0x2381a20).writer: state = open policy.server=0
2014-02-16 13:27:53.369070 7f3420ef7700 20 -- 172.17.0.4:6800/20 >> 172.17.0.4:6789/0 pipe(0x2388000 sd=25 :38871 s=2 pgs=1 cs=1 l=1 c=0x2381a20).writer sleeping

With best regards,
 Pavel.


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux