Re: Pg's stuck in inactive/unclean state + Association from PG-OSD does not seem to be happenning.

Jan Pekař <jan.pekar@xxxxxxxxx> · Mon, 10 Nov 2014 22:48:07 +0100

It is simple.
When you have this kind of problem (stuck), first look into crush map.

And here you are:

You have only one default ruleset 0 with "step take default" (so 
selecting osd's from default root subtree), but your root doesn't 
contain any osds. See below:

rule replicated_ruleset {
	ruleset 0
	type replicated
	min_size 1
	max_size 10
	step take default
	step chooseleaf firstn 0 type host
	step emit
}

root default {
	id -1		# do not change unnecessarily
	# weight 0.000
	alg straw
	hash 0	# rjenkins1
}

I recommend to add octeon1 and octeon as items into default root and it 
should work (or create another root and replace "step take default" with 
your new root name).

JP

On 2014-11-10 20:21, Prashanth Nednoor wrote:
Folks,

Now, we are running into an issue where the PG's(192) are stuck in creating state forever.
I have experimented with various PG settings(osd_pool_default_pg_num from 50 to 400) for replicas and default and doesn't seem to help so far.
Just to give you a brief overview, I have 8 osd's.
I see the create_pg is pending  messages in ceph monitor logs.
I have attached the following logs in the zip file.
1) crush map(crush.map)
2) ceph osd tree, (OSD_TREE.txt OSD's  1,2,3,4 belong to host octeon and  OSD's 0,5,6,7 belong to host octeon1).
3) ceph pg dump, health details etcetc(dump_pgs, health_detail)
4) Attached the ceph.conf
5) ceph osd lspools.
0 data,1 metadata,2 rbd,

Here is the dump for ceph -w before any osd's were created:
ceph -w
     cluster 3eda0199-93a9-428b-8209-caeff84d3d3f
      health HEALTH_ERR 192 pgs stuck inactive; 192 pgs stuck unclean; no osds
      monmap e1: 1 mons at {essperf13=209.243.160.45:6789/0}, election epoch 1, quorum 0 essperf13
      osdmap e205: 0 osds: 0 up, 0 in
       pgmap v928: 192 pgs, 3 pools, 0 bytes data, 0 objects
             0 kB used, 0 kB / 0 kB avail
                  192 creating

2014-11-05 23:26:46.555348 mon.0 [INF] pgmap v928: 192 pgs: 192 creating; 0 bytes data, 0 kB used, 0 kB / 0 kB avail

Here is the dump for ceph -w after  8 osd's were created:
ceph -w
     cluster 3eda0199-93a9-428b-8209-caeff84d3d3f
      health HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean
      monmap e1: 1 mons at {essperf13=209.243.160.45:6789/0}, election epoch 1, quorum 0 essperf13
      osdmap e213: 8 osds: 8 up, 8 in
       pgmap v958: 192 pgs, 3 pools, 0 bytes data, 0 objects
             328 MB used, 14856 GB / 14856 GB avail
                  192 creating

2014-11-05 23:46:25.461143 mon.0 [INF] pgmap v958: 192 pgs: 192 creating; 0 bytes data, 328 MB used, 14856 GB / 14856 GB avail

Any pointers to resolve this issue will be helpful.

Thanks
Prashanth

-----Original Message-----
From: Prashanth Nednoor
Sent: Tuesday, October 28, 2014 9:26 PM
To: 'Sage Weil'
Cc: Philip Kufeldt; ceph-devel@xxxxxxxxxxxxxxx
Subject: RE: cephx auth issues:Having issues trying to get the OSD up on a MIPS64, when the OSD tries to communicate with the monitor!!!

Sage,

As requested I set the debug setting in ceph.conf on both the sides.
Here are the logs for  the  OSD and  MONITOR attached.
1) OSD : IPADDRESS: 209.243.157.187. Logfile attached is: Ceph-0.log
2) MONITOR: IP ADDRESS: 209.243.160.45, Logfile attached is: Ceph-mon.essperf13.log

Please Note that AUTHENTICATION IS DISABLED IN THE /etc/ceph/ceph.conf files on both OSD and monitor.
In addition to this on the OSD side I by-passed part of the authentication code that was causing trouble(monc->authenticate) in osd_init function call. I hope this is ok.
Good news is my osd daemon is up now on the MIPS side, finally, but for some reason MONITOR is still not detecting the OSD.

It seems from the ceph mon log, it knows the  OSD is at 187 and it does exchange some information.

Thanks for your prompt response and help.

Thanks
Prashanth

-----Original Message-----
From: Sage Weil [mailto:sage@xxxxxxxxxxxx]
Sent: Tuesday, October 28, 2014 4:59 PM
To: Prashanth Nednoor
Cc: Philip Kufeldt; ceph-devel@xxxxxxxxxxxxxxx
Subject: Re: cephx auth issues:Having issues trying to get the OSD up on a MIPS64, when the OSD tries to communicate with the monitor!!!

Hi,

On Tue, 28 Oct 2014, Prashanth Nednoor wrote:
Folks,

I am trying to get the osd up and having an issue. OSD does exchange some messages with the MONITOR before this error.
Seems like an issue with authentication in my set up with MIPS based OSD and Intel XEON MONITORS. I have attached the logs.
The OSD(209.243.157.187) sends some request to MONITOR (209.243.160.45).
I see this message No session security set, followed by the below message.
The reply is coming back as auth_reply(proto 2 -1 (1) Operation not permitted.

Is there an ENDIAN issue here between MIPS based OSD(BIGEENDIAN) and INTEL XEONS(LITTLE ENDIAN), my CEPH-MOINTORS are INTEL XEONS???

I made sure the keyrings are all consistent. Here are the keys on OSD and MONITOR.

I tried disabling authentication by setting the following auth_service_required = none, auth_client_required = none and auth_cluster_required = none.
Looks there was some issue with this in osd_init code, where it seems like AUTHENTICATION IS MANDATORY.

HERE IS THE INFORMATION ON MY KEYS ON OSD AND MONITOR.
ON THE OSD:
more /etc/ceph/ceph.client.admin.keyring
[osd.0]
         key = AQCddYJv4JkxIhAApeqP7Ahp+uUXYrgmgQt+LA==
[client.admin]
         key = AQA1jixUQAaWABAA1tAjhIbrmOCIqNAkeNVulQ==

more /var/lib/ceph/bootstrap-osd/ceph.keyring
[client.bootstrap-osd]
         key = AQA1jixUwGjoGxAASUUlYC2rGfH7Zl4rCfCylA==

ON THE MONITOR:
more /etc/ceph/ceph.client.admin.keyring
[client.admin]
         key = AQA1jixUQAaWABAA1tAjhIbrmOCIqNAkeNVulQ==

more /var/lib/ceph/bootstrap-osd/ceph.keyring
[client.bootstrap-osd]
         key = AQA1jixUwGjoGxAASUUlYC2rGfH7Zl4rCfCylA==

Any pointers are  greatly appreciated??
Thanks in advance for help.

Can you put

  debug auth = 20
  debug mon = 20
  debug ms = 20

in the [global] section of ceph.conf and reproduce this, and attach both the ceph mon log and osd logs?

Thanks!
sage

thanks
Prashanth

-----Original Message-----
From: Prashanth Nednoor
Sent: Sunday, October 26, 2014 9:14 PM
To: 'Sage Weil'; Philip Kufeldt
Cc: 'ceph-devel@xxxxxxxxxxxxxxx'
Subject: RE: Having issues trying to get the OSD up on a MIPS64!!!

Sage,

Good news, I am able to create the OSD successfully, let's see what's in store next.

It was an issue with  leveldb1.17 not having  either memory barrier or atomic operation support for DEBIAN MIPS???
Not even the latest version leveldb1.18  I pulled from https://github.com/google/leveldb.

But this link  talks about that
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=681945

So, I ported over the memory barrier/atomic fix for MIPS onto leveldb1.17... I had to look into the mips/barrier.h files on our eval board, to make sure We had the correct macros.

Now, my osd creation is successful on the MIPS, created object store
/var/lib/ceph/osd/ceph-0 journal /dev/sda2 for osd.0 fsid
f615496c-b40a-4905-bbcd-2d3e181ff21a
I have to start looking into the CLIENT/MONITOR side to make sure everything is good.

Really thankful for your suggestions for this quick resolution, for now we are good, untill the next and then the next......

Thanks
Prashanth

-----Original Message-----
From: Prashanth Nednoor
Sent: Sunday, October 26, 2014 7:32 PM
To: 'Sage Weil'; Philip Kufeldt
Cc: ceph-devel@xxxxxxxxxxxxxxx
Subject: RE: Having issues trying to get the OSD up on a MIPS64!!!

Hi Sage,

Leveldb version is 1.17.

Thanks
Prashanth

-----Original Message-----
From: Sage Weil [mailto:sage@xxxxxxxxxxxx]
Sent: Friday, October 24, 2014 6:11 PM
To: Philip Kufeldt
Cc: Prashanth Nednoor; ceph-devel@xxxxxxxxxxxxxxx
Subject: RE: Having issues trying to get the OSD up on a MIPS64!!!

On Sat, 25 Oct 2014, Philip Kufeldt wrote:
64 bit big endian

My guess is that there is an endianness bug in leveldb then.  I wonder who else has tried it on MIPS?

sage

-----Original Message-----
From: Sage Weil [mailto:sage@xxxxxxxxxxxx]
Sent: Friday, October 24, 2014 5:47 PM
To: Prashanth Nednoor
Cc: ceph-devel@xxxxxxxxxxxxxxx; Philip Kufeldt
Subject: RE: Having issues trying to get the OSD up on a MIPS64!!!

Hi Prashanth,

On Fri, 24 Oct 2014, Prashanth Nednoor wrote:
Hi Sage,

Thank you for the prompt response.
Is there anything in /dev/disk/by-partuuid/ or is it missing entirely?
   Nothing , it was Missing Entirely.
   GOOD NEWS:  I worked around  this issue, if I set my journal
path in the
/etc/ceph.conf.

My udev version is udevd --version 164

Hmm, that should be new enough, but it seems like it isn't setting
up the links.  What distro is it?  On most systems it's
/lib/udev/rules.d/60-persistent- storage.rules that does it.
Maybe see if running partprobe /dev/sda or run 'udevadm monitor'
and do 'udevadm trigger /dev/sda' in another terminal to see what happens.

Or, work around it like you did. :)

I still see the segfaults, I have attached details.
I put the osd debug logs(osd-output.txt) and the
leveldb_bt(leveldb_bt.txt).
Looks like we have an issue in leveldb....

Yeah, that looks like a problem with leveldb.  What distro is this?
What version leveldb?

I don't actually know anything about MIPS.. what's teh wordsize
and endianess?

sage

HERE IS THE BACK TRACE: I have attached the gdb before running it.
#0  0x77f68ee0 in leveldb::SkipList<char const*,
leveldb::MemTable::KeyComparator>::FindGreaterOrEqual(char
const* const&, leveldb::SkipList<char const*,
leveldb::MemTable::KeyComparator>::Node**) const () from
/usr/local/lib/libleveldb.so.1
#1  0x77f69054 in leveldb::SkipList<char const*,
leveldb::MemTable::KeyComparator>::Insert(char const* const&) ()
from
/usr/local/lib/libleveldb.so.1
#2  0x77f68618 in leveldb::MemTable::Add(unsigned long long,
leveldb::ValueType, leveldb::Slice const&, leveldb::Slice const&)
()
    from /usr/local/lib/libleveldb.so.1
#3  0x77f7e434 in leveldb::(anonymous
namespace)::MemTableInserter::Put(leveldb::Slice const&,
leveldb::Slice
const&) ()
    from /usr/local/lib/libleveldb.so.1
#4  0x77f7e93c in
leveldb::WriteBatch::Iterate(leveldb::WriteBatch::Handler*)
const
() from /usr/local/lib/libleveldb.so.1
#5  0x77f7eb8c in
leveldb::WriteBatchInternal::InsertInto(leveldb::WriteBatch
const*,
leveldb::MemTable*) () from /usr/local/lib/libleveldb.so.1
#6  0x77f59360 in leveldb::DBImpl::Write(leveldb::WriteOptions
const&,
leveldb::WriteBatch*) () from /usr/local/lib/libleveldb.so.1
#7  0x00a5dda0 in LevelDBStore::submit_transaction_sync
(this=0x1f77d10, t=<value optimized out>) at
os/LevelDBStore.cc:146
#8  0x00b0d344 in DBObjectMap::sync (this=0x1f7af28, oid=0x0,
spos=0x72cfe3b8) at os/DBObjectMap.cc:1126
#9  0x009b10b8 in FileStore::_set_replay_guard (this=0x1f72450,
fd=17, spos=..., hoid=0x0, in_progress=false) at
os/FileStore.cc:2070
#10 0x009b1c0c in FileStore::_set_replay_guard (this=0x1f72450,
cid=DWARF-2 expression error: DW_OP_reg operations must be used
either alone or in conjuction with DW_OP_piece.
) at os/FileStore.cc:2047
#11 0x009b2138 in FileStore::_create_collection (this=0x1f72450,
c=DWARF-
2 expression error: DW_OP_reg operations must be used either alone
or in conjuction with DW_OP_piece.
) at os/FileStore.cc:4753
#12 0x009e42a8 in FileStore::_do_transaction (this=0x1f72450,
t=..., op_seq=<value optimized out>, trans_num=0,
handle=0x72cfec3c) at
os/FileStore.cc:2413
#13 0x009eb47c in FileStore::_do_transactions (this=0x1f72450,
tls=..., op_seq=2, handle=0x72cfec3c) at os/FileStore.cc:1952
#14 0x009eb858 in FileStore::_do_op (this=0x1f72450,
osr=0x1f801b8,
handle=...) at os/FileStore.cc:1761
#15 0x00c8f0bc in ThreadPool::worker (this=0x1f72cf0,
wt=0x1f7ea90) at
common/WorkQueue.cc:128
#16 0x00c91b94 in ThreadPool::WorkThread::entry() ()
#17 0x77f1c0a8 in start_thread () from /lib/libpthread.so.0
#18 0x777c1738 in ?? () from /lib/libc.so.6

Do  I need to set any variable to set the cache size etcetc in ceph.conf.
I only have osd_leveldb_cache_size=5242880 for now.

Thanks
Prashanth

-----Original Message-----
From: Sage Weil [mailto:sage@xxxxxxxxxxxx]
Sent: Thursday, October 23, 2014 5:54 PM
To: Prashanth Nednoor
Cc: ceph-devel@xxxxxxxxxxxxxxx
Subject: Re: Having issues trying to get the OSD up on a MIPS64!!!

Hi Prashanth,

On Thu, 23 Oct 2014, Prashanth Nednoor wrote:
Hello Everyone,

We are using ceph-0.86, good news is we were able to compile
and load all the libraries and binaries needed to configure a
CEPH-OSD on MIPS
64 platform. The CEPH monitor is also able to detect the OSD,
but not up yet, as the osd activate failed.
Since we don?t have the required CEPH deploy utility for
MIPS64, we are following the manual procedure to create and activate an OSD.
We have disabled authentication between the clients and the
OSD?s for now.

Has any body tried CEPH on a MIPS64?
/dev/sda is a 2TB local hard drive.

This is how my partition looks after ceph-disk-prepare
/home/prashan/ceph-0.86/src# parted GNU Parted 2.3 Using
/dev/sda Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p
Model: ATA TOSHIBA MQ01ABB2 (scsi) Disk /dev/sda: 2000GB
Sector size (logical/physical): 512B/4096B Partition Table:
gpt

Number  Start   End     Size    File system  Name          Flags
  2      1049kB  5369MB  5368MB               ceph journal
  1      5370MB  2000GB  1995GB  xfs          ceph data

The following are the steps to create an OSD
1)	ceph-disk zap /dev/sda
2)	ceph-disk-prepare --cluster  f615496c-b40a-4905-bbcd-
2d3e181ff21a --fs-type xfs /dev/sda
3)	mount /dev/sda1 /var/lib/ceph/osd/ceph-0/
4)	ceph-osd -i 0 ?mkfs is giving an error ,
filestore(/var/lib/ceph/osd/ceph-0) could not find
23c2fcde/osd_superblock/0//-1 in index: (2) No such file.
After this it segfaults. We have analyzed this further with
the help of strace and root caused this as objectmap file reading issue.
open("/var/lib/ceph/osd/ceph-0/current/omap/000005.log",
O_RDONLY)
=
11, the first time it reads 32k, the read succeeds with 63
bytes and it tries to read again with 27k and the read returns
0 bytes and the CEPH osd segfaults.

Can you generate a full log with --debug-osd 20
--debug-filestore
20 --
debug-jouranl 20 passed to ceph-osd --mkfs and post that somewhere?
It should tell us where things are going wrong.  In particular, we
want to see if that file/object is being written properly.  It
will also have a backtrace showing exactly where it crashed.

Please note that ceph-disk prepare creates a journal in a path
which is not
valid(dev/disk/by-partuuid/cbd4a5d1-012f-4863-b492-
080ad2a505cb).
So after step3 above I remove this journal below and manually
create a journal file before doing step4 above.

ls -l /var/lib/ceph/osd/ceph-0/ total 16
-rw-r--r-- 1 root root 37 Oct 22 21:40 ceph_fsid
-rw-r--r-- 1 root root 37 Oct 22 21:40 fsid lrwxrwxrwx 1 root
root
58 Oct 22 21:40 journal -> /dev/disk/by-
partuuid/cbd4a5d1-012f-4863-b492-080ad2a505cb

Is there anything in /dev/disk/by-partuuid/ or is it missing entirely?
Maybe you have an old udev.  What distro is this?

sage

-rw-r--r-- 1 root root 37 Oct 22 21:40 journal_uuid
-rw-r--r-- 1 root root 21 Oct 22 21:40 magic

Any pointers to move ahead will be greatly appreciated??

thanks
Prashanth

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel"
in the body of a message to majordomo@xxxxxxxxxxxxxxx More
majordomo
info at  http://vger.kernel.org/majordomo-info.html

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
============
Ing. Jan Pekař
jan.pekar@xxxxxxxxx | +420603811737
----
Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz
============
--
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com