firefly osds stuck in state booting

t10tennn@xxxxxxxxx (10 minus) · Tue, 29 Jul 2014 09:33:22 +0200

Hi Karan ,

Thanks .. that did the trick ..
The magic word was "in"
regarding rep size . I have adjusted them
my settings are
--snip--
osd pool default size =
2

osd pool default min size =
1

osd pool default pg num =
100

osd pool default pgp num = 100
--snip--

# Also in the meantime I had chance to play with ceph-deploy script too.
# Maybe it was me or probably it is a bug . I have tried twice and
everytime I have hit this

As I said before I'm using a directory as this is a test installation .

ceph-deploy osd prepare ceph2:/ceph2:/ceph2/journald <=== Works

but

--snip--
ceph-deploy osd activate
ceph2:/ceph2:/ceph2/journald

[ceph_deploy.conf][DEBUG ] found configuration file at:
/home/ceph/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.9): /usr/bin/ceph-deploy osd
activate ceph2:/ceph2:/ceph2/journald
[ceph_deploy.osd][DEBUG ] Activating cluster ceph disks
ceph2:/ceph2:/ceph2/journald
[ceph2][DEBUG ] connected to host: ceph2
[ceph2][DEBUG ] detect platform information from remote host
[ceph2][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: CentOS 6.5 Final
[ceph_deploy.osd][DEBUG ] activating host ceph2 disk /ceph2
[ceph_deploy.osd][DEBUG ] will use init type: sysvinit
[ceph2][INFO  ] Running command: sudo ceph-disk-activate --mark-init
sysvinit --mount /ceph2
[ceph2][WARNIN] got monmap epoch 2
[ceph2][WARNIN] 2014-07-28 11:47:04.733204 7f08d1c667a0 -1 journal
FileJournal::_open: disabling aio for non-block journal.  Use
journal_force_aio to force
 use of aio anyway
[ceph2][WARNIN] 2014-07-28 11:47:04.733400 7f08d1c667a0 -1 journal check:
ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected
4795daff-
d63f-415b-9824-75f0863eb14f, invalid (someone else's?) journal
[ceph2][WARNIN] 2014-07-28 11:47:04.796835 7f08d1c667a0 -1 journal
FileJournal::_open: disabling aio for non-block journal.  Use
journal_force_aio to force
 use of aio anyway
[ceph2][WARNIN] 2014-07-28 11:47:04.798944 7f08d1c667a0 -1
filestore(/ceph2) could not find 23c2fcde/osd_superblock/0//-1 in index:
(2) No such file or dir
ectory
[ceph2][WARNIN] 2014-07-28 11:47:04.874282 7f08d1c667a0 -1 created object
store /ceph2 journal /ceph2/journal for osd.1 fsid
109507ab-adf1-4eb6-aacf-092549
4e3882
[ceph2][WARNIN] 2014-07-28 11:47:04.874474 7f08d1c667a0 -1 auth: error
reading file: /ceph2/keyring: can't open /ceph2/keyring: (2) No such file
or directo
ry
[ceph2][WARNIN] 2014-07-28 11:47:04.875209 7f08d1c667a0 -1 created new key
in keyring /ceph2/keyring
[ceph2][WARNIN] added key for osd.1
[ceph2][WARNIN] ceph-disk: Error: unable to create symlink
/var/lib/ceph/osd/ceph-1 -> /ceph2
[ceph2][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command:
ceph-disk-activate --mark-init sysvinit --mount /ceph2
--snip--

It turns out ceph-deploy does not create the directory /var/lib/ceph/osd
and if I create them everything works.

Cheers

On Mon, Jul 28, 2014 at 9:09 AM, Karan Singh <karan.singh at csc.fi> wrote:

> The output that you have provided says that OSDs are not IN , Try the below
>
> ceph osd in osd.0
> ceph osd in osd.1
>
> service ceph start osd.0
> service ceph start osd.1
>
> If you have 1 more host with 1 disk , add it , starting Ceph Firefly
> default rep size is 3
>
>
> - Karan -
>
> On 27 Jul 2014, at 11:17, 10 minus <t10tennn at gmail.com> wrote:
>
> Hi Sage,
>
> I have dropped all unset .. and even restarted the osd
> No dice .. OSDs are still stuck .
>
>
>
> --snip--
>  ceph daemon osd.0 status                                            ?rtt
> min/avg/max/mdev = 0.095/0.120/0.236/0.015
> ms
>
> { "cluster_fsid":
> "99babb8f-c880-4b32-a227-94aa483d4871",
> ?[root at ceph2 ~]#  ceph daemon osd.1
> status
>
>   "osd_fsid":
> "1ad28bde-c23c-44ba-a3b7-0aaaafd3372e",                                ?{
> "cluster_fsid":
> "99babb8f-c880-4b32-a227-94aa483d4871",
>
>   "whoami":
> 0,                                                                       ?
> "osd_fsid":
> "becc3252-6977-47d6-87af-7b1337e591d8",
>
>   "state":
> "booting",
> ?  "whoami":
> 1,
>
>   "oldest_map":
> 1,                                                                   ?
> "state":
> "booting",
>
>   "newest_map":
> 24,                                                                  ?
> "oldest_map":
> 1,
>
>   "num_pgs":
> 0}                                                                      ?
> "newest_map":
> 21,
>
>  --snip--
>
> --snip--
> ceph osd
> tree
>
> # id    weight  type name       up/down reweight
> -1      2       root default
> -3      1               host ceph1
> 0       1                       osd.0   down    0
> -2      1               host ceph2
> 1       1                       osd.1   down    0
>
>  --snip--
>
> --snip--
>  ceph -s
>     cluster 2929fa80-0841-4cb6-a133-90b2098fc802
>      health HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean
>      monmap e2: 3 mons at {ceph0=
> 10.0.12.220:6789/0,ceph1=10.0.12.221:6789/0,ceph2=10.0.12.222:6789/0},
> election epoch 50, quorum 0,1,2 ceph0,ceph1,ceph2
>      osdmap e24: 2 osds: 0 up, 0 in
>       pgmap v25: 192 pgs, 3 pools, 0 bytes data, 0 objects
>             0 kB used, 0 kB / 0 kB avail
>                  192 creating
> --snip--
>
>
>
>
> On Sat, Jul 26, 2014 at 5:57 PM, Sage Weil <sweil at redhat.com> wrote:
>
>> On Sat, 26 Jul 2014, 10 minus wrote:
>> > Hi,
>> >
>> > I just setup a test ceph installation on 3 node Centos 6.5  .
>> > two of the nodes are used for hosting osds and the third acts as mon .
>> >
>> > Please note I'm using LVM so had to set up the osd using the manual
>> install
>> > guide.
>> >
>> > --snip--
>> > ceph -s
>> >     cluster 2929fa80-0841-4cb6-a133-90b2098fc802
>> >      health HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean;
>> > noup,nodown,noout flag(s) set
>> >      monmap e2: 3 mons at{ceph0=
>> 10.0.12.220:6789/0,ceph1=10.0.12.221:6789/0,ceph2=10.0.12.222:6789/0
>> > }, election epoch 46, quorum 0,1,2 ceph0,ceph1,ceph2
>> >      osdmap e21: 2 osds: 0 up, 0 in
>> >             flags noup,nodown,noout
>>                     ^^^^
>>
>> Do 'ceph osd unset noup' and they should start up.  You likely also want
>> to clear nodown and noout as well.
>>
>> sage
>>
>>
>> >       pgmap v22: 192 pgs, 3 pools, 0 bytes data, 0 objects
>> >             0 kB used, 0 kB / 0 kB avail
>> >                  192 creating
>> > --snip--
>> >
>> > osd tree
>> >
>> > --snip--
>> > ceph osd tree
>> > # id    weight  type name       up/down reweight
>> > -1      2       root default
>> > -3      1               host ceph1
>> > 0       1                       osd.0   down    0
>> > -2      1               host ceph2
>> > 1       1                       osd.1   down    0
>> > --snip--
>> >
>> > --snip--
>> >  ceph daemon osd.0 status
>> > { "cluster_fsid": "99babb8f-c880-4b32-a227-94aa483d4871",
>> >   "osd_fsid": "1ad28bde-c23c-44ba-a3b7-0aaaafd3372e",
>> >   "whoami": 0,
>> >   "state": "booting",
>> >   "oldest_map": 1,
>> >   "newest_map": 21,
>> >   "num_pgs": 0}
>> >
>> > --snip--
>> >
>> > --snip--
>> >  ceph daemon osd.1 status
>> > { "cluster_fsid": "99babb8f-c880-4b32-a227-94aa483d4871",
>> >   "osd_fsid": "becc3252-6977-47d6-87af-7b1337e591d8",
>> >   "whoami": 1,
>> >   "state": "booting",
>> >   "oldest_map": 1,
>> >   "newest_map": 21,
>> >   "num_pgs": 0}
>> > --snip--
>> >
>> > # Cpus are idling
>> >
>> > # does anybody know what is wrong
>> >
>> > Thanks in advance
>> >
>> >
>>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140729/934774e3/attachment.htm>