Re: Pg's stuck in inactive/unclean state + Association from PG-OSD does not seem to be happenning.

Prashanth Nednoor <Prashanth.Nednoor@xxxxxxxxxxxxxxxx> · Mon, 10 Nov 2014 19:21:04 +0000

Folks,

Now, we are running into an issue where the PG's(192) are stuck in creating state forever.
I have experimented with various PG settings(osd_pool_default_pg_num from 50 to 400) for replicas and default and doesn't seem to help so far.
Just to give you a brief overview, I have 8 osd's.
I see the create_pg is pending  messages in ceph monitor logs.
I have attached the following logs in the zip file.
1) crush map(crush.map)
2) ceph osd tree, (OSD_TREE.txt OSD's  1,2,3,4 belong to host octeon and  OSD's 0,5,6,7 belong to host octeon1).
3) ceph pg dump, health details etcetc(dump_pgs, health_detail)
4) Attached the ceph.conf
5) ceph osd lspools.
0 data,1 metadata,2 rbd,

Here is the dump for ceph -w before any osd's were created:
ceph -w
    cluster 3eda0199-93a9-428b-8209-caeff84d3d3f
     health HEALTH_ERR 192 pgs stuck inactive; 192 pgs stuck unclean; no osds
     monmap e1: 1 mons at {essperf13=209.243.160.45:6789/0}, election epoch 1, quorum 0 essperf13
     osdmap e205: 0 osds: 0 up, 0 in
      pgmap v928: 192 pgs, 3 pools, 0 bytes data, 0 objects
            0 kB used, 0 kB / 0 kB avail
                 192 creating

2014-11-05 23:26:46.555348 mon.0 [INF] pgmap v928: 192 pgs: 192 creating; 0 bytes data, 0 kB used, 0 kB / 0 kB avail

Here is the dump for ceph -w after  8 osd's were created:
ceph -w
    cluster 3eda0199-93a9-428b-8209-caeff84d3d3f
     health HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean
     monmap e1: 1 mons at {essperf13=209.243.160.45:6789/0}, election epoch 1, quorum 0 essperf13
     osdmap e213: 8 osds: 8 up, 8 in
      pgmap v958: 192 pgs, 3 pools, 0 bytes data, 0 objects
            328 MB used, 14856 GB / 14856 GB avail
                 192 creating

2014-11-05 23:46:25.461143 mon.0 [INF] pgmap v958: 192 pgs: 192 creating; 0 bytes data, 328 MB used, 14856 GB / 14856 GB avail

Any pointers to resolve this issue will be helpful.

Thanks
Prashanth

-----Original Message-----
From: Prashanth Nednoor
Sent: Tuesday, October 28, 2014 9:26 PM
To: 'Sage Weil'
Cc: Philip Kufeldt; ceph-devel@xxxxxxxxxxxxxxx
Subject: RE: cephx auth issues:Having issues trying to get the OSD up on a MIPS64, when the OSD tries to communicate with the monitor!!!

Sage,

As requested I set the debug setting in ceph.conf on both the sides.
Here are the logs for  the  OSD and  MONITOR attached.
1) OSD : IPADDRESS: 209.243.157.187. Logfile attached is: Ceph-0.log
2) MONITOR: IP ADDRESS: 209.243.160.45, Logfile attached is: Ceph-mon.essperf13.log  

Please Note that AUTHENTICATION IS DISABLED IN THE /etc/ceph/ceph.conf files on both OSD and monitor.
In addition to this on the OSD side I by-passed part of the authentication code that was causing trouble(monc->authenticate) in osd_init function call. I hope this is ok.
Good news is my osd daemon is up now on the MIPS side, finally, but for some reason MONITOR is still not detecting the OSD.

It seems from the ceph mon log, it knows the  OSD is at 187 and it does exchange some information.

Thanks for your prompt response and help.

Thanks
Prashanth

-----Original Message-----
From: Sage Weil [mailto:sage@xxxxxxxxxxxx]
Sent: Tuesday, October 28, 2014 4:59 PM
To: Prashanth Nednoor
Cc: Philip Kufeldt; ceph-devel@xxxxxxxxxxxxxxx
Subject: Re: cephx auth issues:Having issues trying to get the OSD up on a MIPS64, when the OSD tries to communicate with the monitor!!!

Hi,

On Tue, 28 Oct 2014, Prashanth Nednoor wrote:
> Folks,
> 
> I am trying to get the osd up and having an issue. OSD does exchange some messages with the MONITOR before this error.
> Seems like an issue with authentication in my set up with MIPS based OSD and Intel XEON MONITORS. I have attached the logs.
> The OSD(209.243.157.187) sends some request to MONITOR (209.243.160.45).
> I see this message No session security set, followed by the below message.
> The reply is coming back as auth_reply(proto 2 -1 (1) Operation not permitted.
> 
> Is there an ENDIAN issue here between MIPS based OSD(BIGEENDIAN) and INTEL XEONS(LITTLE ENDIAN), my CEPH-MOINTORS are INTEL XEONS???
> 
> I made sure the keyrings are all consistent. Here are the keys on OSD and MONITOR.
> 
> I tried disabling authentication by setting the following auth_service_required = none, auth_client_required = none and auth_cluster_required = none.
> Looks there was some issue with this in osd_init code, where it seems like AUTHENTICATION IS MANDATORY.
> 
> HERE IS THE INFORMATION ON MY KEYS ON OSD AND MONITOR.
> ON THE OSD:
> more /etc/ceph/ceph.client.admin.keyring
> [osd.0]
>         key = AQCddYJv4JkxIhAApeqP7Ahp+uUXYrgmgQt+LA==
> [client.admin]
>         key = AQA1jixUQAaWABAA1tAjhIbrmOCIqNAkeNVulQ==
> 
> more /var/lib/ceph/bootstrap-osd/ceph.keyring
> [client.bootstrap-osd]
>         key = AQA1jixUwGjoGxAASUUlYC2rGfH7Zl4rCfCylA==
> 
> ON THE MONITOR:
> more /etc/ceph/ceph.client.admin.keyring
> [client.admin]
>         key = AQA1jixUQAaWABAA1tAjhIbrmOCIqNAkeNVulQ==
> 
> more /var/lib/ceph/bootstrap-osd/ceph.keyring
> [client.bootstrap-osd]
>         key = AQA1jixUwGjoGxAASUUlYC2rGfH7Zl4rCfCylA==
> 
> 
> Any pointers are  greatly appreciated?? 
> Thanks in advance for help.

Can you put

 debug auth = 20
 debug mon = 20
 debug ms = 20

in the [global] section of ceph.conf and reproduce this, and attach both the ceph mon log and osd logs?

Thanks!
sage

> thanks
> Prashanth
> 
> 
> -----Original Message-----
> From: Prashanth Nednoor
> Sent: Sunday, October 26, 2014 9:14 PM
> To: 'Sage Weil'; Philip Kufeldt
> Cc: 'ceph-devel@xxxxxxxxxxxxxxx'
> Subject: RE: Having issues trying to get the OSD up on a MIPS64!!!
> 
> Sage,
> 
> Good news, I am able to create the OSD successfully, let's see what's in store next.
> 
> It was an issue with  leveldb1.17 not having  either memory barrier or atomic operation support for DEBIAN MIPS???
> Not even the latest version leveldb1.18  I pulled from https://github.com/google/leveldb.
> 
> But this link  talks about that
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=681945
> 
> So, I ported over the memory barrier/atomic fix for MIPS onto leveldb1.17... I had to look into the mips/barrier.h files on our eval board, to make sure We had the correct macros.
> 
> Now, my osd creation is successful on the MIPS, created object store
> /var/lib/ceph/osd/ceph-0 journal /dev/sda2 for osd.0 fsid 
> f615496c-b40a-4905-bbcd-2d3e181ff21a
> I have to start looking into the CLIENT/MONITOR side to make sure everything is good.
> 
> Really thankful for your suggestions for this quick resolution, for now we are good, untill the next and then the next......
> 
> Thanks
> Prashanth
> 
> -----Original Message-----
> From: Prashanth Nednoor
> Sent: Sunday, October 26, 2014 7:32 PM
> To: 'Sage Weil'; Philip Kufeldt
> Cc: ceph-devel@xxxxxxxxxxxxxxx
> Subject: RE: Having issues trying to get the OSD up on a MIPS64!!!
> 
> Hi Sage,
> 
> Leveldb version is 1.17.
> 
> Thanks
> Prashanth
> 
> -----Original Message-----
> From: Sage Weil [mailto:sage@xxxxxxxxxxxx]
> Sent: Friday, October 24, 2014 6:11 PM
> To: Philip Kufeldt
> Cc: Prashanth Nednoor; ceph-devel@xxxxxxxxxxxxxxx
> Subject: RE: Having issues trying to get the OSD up on a MIPS64!!!
> 
> On Sat, 25 Oct 2014, Philip Kufeldt wrote:
> > 64 bit big endian
> 
> My guess is that there is an endianness bug in leveldb then.  I wonder who else has tried it on MIPS?
> 
> sage
> 
> 
> > 
> > > -----Original Message-----
> > > From: Sage Weil [mailto:sage@xxxxxxxxxxxx]
> > > Sent: Friday, October 24, 2014 5:47 PM
> > > To: Prashanth Nednoor
> > > Cc: ceph-devel@xxxxxxxxxxxxxxx; Philip Kufeldt
> > > Subject: RE: Having issues trying to get the OSD up on a MIPS64!!!
> > > 
> > > Hi Prashanth,
> > > 
> > > On Fri, 24 Oct 2014, Prashanth Nednoor wrote:
> > > > Hi Sage,
> > > >
> > > > Thank you for the prompt response.
> > > > Is there anything in /dev/disk/by-partuuid/ or is it missing entirely?
> > > >   Nothing , it was Missing Entirely.
> > > >   GOOD NEWS:  I worked around  this issue, if I set my journal 
> > > > path in the
> > > /etc/ceph.conf.
> > > >
> > > > My udev version is udevd --version 164
> > > 
> > > Hmm, that should be new enough, but it seems like it isn't setting 
> > > up the links.  What distro is it?  On most systems it's
> > > /lib/udev/rules.d/60-persistent- storage.rules that does it.  
> > > Maybe see if running partprobe /dev/sda or run 'udevadm monitor' 
> > > and do 'udevadm trigger /dev/sda' in another terminal to see what happens.
> > > 
> > > Or, work around it like you did. :)
> > > 
> > > > I still see the segfaults, I have attached details.
> > > > I put the osd debug logs(osd-output.txt) and the
> > > leveldb_bt(leveldb_bt.txt).
> > > > Looks like we have an issue in leveldb....
> > > 
> > > Yeah, that looks like a problem with leveldb.  What distro is this?  
> > > What version leveldb?
> > > 
> > > I don't actually know anything about MIPS.. what's teh wordsize 
> > > and endianess?
> > > 
> > > sage
> > > 
> > > 
> > > >
> > > > HERE IS THE BACK TRACE: I have attached the gdb before running it.
> > > > #0  0x77f68ee0 in leveldb::SkipList<char const*, 
> > > > leveldb::MemTable::KeyComparator>::FindGreaterOrEqual(char
> > > > const* const&, leveldb::SkipList<char const*,
> > > > leveldb::MemTable::KeyComparator>::Node**) const () from
> > > > /usr/local/lib/libleveldb.so.1
> > > > #1  0x77f69054 in leveldb::SkipList<char const*, 
> > > > leveldb::MemTable::KeyComparator>::Insert(char const* const&) () 
> > > > from
> > > > /usr/local/lib/libleveldb.so.1
> > > > #2  0x77f68618 in leveldb::MemTable::Add(unsigned long long,
> > > leveldb::ValueType, leveldb::Slice const&, leveldb::Slice const&)
> > > ()
> > > >    from /usr/local/lib/libleveldb.so.1
> > > > #3  0x77f7e434 in leveldb::(anonymous
> > > namespace)::MemTableInserter::Put(leveldb::Slice const&, 
> > > leveldb::Slice
> > > const&) ()
> > > >    from /usr/local/lib/libleveldb.so.1
> > > > #4  0x77f7e93c in
> > > > leveldb::WriteBatch::Iterate(leveldb::WriteBatch::Handler*)
> > > > const
> > > > () from /usr/local/lib/libleveldb.so.1
> > > > #5  0x77f7eb8c in
> > > > leveldb::WriteBatchInternal::InsertInto(leveldb::WriteBatch
> > > > const*,
> > > > leveldb::MemTable*) () from /usr/local/lib/libleveldb.so.1
> > > > #6  0x77f59360 in leveldb::DBImpl::Write(leveldb::WriteOptions
> > > > const&,
> > > > leveldb::WriteBatch*) () from /usr/local/lib/libleveldb.so.1
> > > > #7  0x00a5dda0 in LevelDBStore::submit_transaction_sync
> > > > (this=0x1f77d10, t=<value optimized out>) at
> > > > os/LevelDBStore.cc:146
> > > > #8  0x00b0d344 in DBObjectMap::sync (this=0x1f7af28, oid=0x0,
> > > > spos=0x72cfe3b8) at os/DBObjectMap.cc:1126
> > > > #9  0x009b10b8 in FileStore::_set_replay_guard (this=0x1f72450, 
> > > > fd=17, spos=..., hoid=0x0, in_progress=false) at
> > > > os/FileStore.cc:2070
> > > > #10 0x009b1c0c in FileStore::_set_replay_guard (this=0x1f72450,
> > > cid=DWARF-2 expression error: DW_OP_reg operations must be used 
> > > either alone or in conjuction with DW_OP_piece.
> > > > ) at os/FileStore.cc:2047
> > > > #11 0x009b2138 in FileStore::_create_collection (this=0x1f72450,
> > > > c=DWARF-
> > > 2 expression error: DW_OP_reg operations must be used either alone 
> > > or in conjuction with DW_OP_piece.
> > > > ) at os/FileStore.cc:4753
> > > > #12 0x009e42a8 in FileStore::_do_transaction (this=0x1f72450, 
> > > > t=..., op_seq=<value optimized out>, trans_num=0,
> > > > handle=0x72cfec3c) at
> > > > os/FileStore.cc:2413
> > > > #13 0x009eb47c in FileStore::_do_transactions (this=0x1f72450, 
> > > > tls=..., op_seq=2, handle=0x72cfec3c) at os/FileStore.cc:1952
> > > > #14 0x009eb858 in FileStore::_do_op (this=0x1f72450, 
> > > > osr=0x1f801b8,
> > > > handle=...) at os/FileStore.cc:1761
> > > > #15 0x00c8f0bc in ThreadPool::worker (this=0x1f72cf0,
> > > > wt=0x1f7ea90) at
> > > > common/WorkQueue.cc:128
> > > > #16 0x00c91b94 in ThreadPool::WorkThread::entry() ()
> > > > #17 0x77f1c0a8 in start_thread () from /lib/libpthread.so.0
> > > > #18 0x777c1738 in ?? () from /lib/libc.so.6
> > > >
> > > > Do  I need to set any variable to set the cache size etcetc in ceph.conf.
> > > > I only have osd_leveldb_cache_size=5242880 for now.
> > > >
> > > >
> > > > Thanks
> > > > Prashanth
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Sage Weil [mailto:sage@xxxxxxxxxxxx]
> > > > Sent: Thursday, October 23, 2014 5:54 PM
> > > > To: Prashanth Nednoor
> > > > Cc: ceph-devel@xxxxxxxxxxxxxxx
> > > > Subject: Re: Having issues trying to get the OSD up on a MIPS64!!!
> > > >
> > > > Hi Prashanth,
> > > >
> > > > On Thu, 23 Oct 2014, Prashanth Nednoor wrote:
> > > > > Hello Everyone,
> > > > >
> > > > > We are using ceph-0.86, good news is we were able to compile 
> > > > > and load all the libraries and binaries needed to configure a 
> > > > > CEPH-OSD on MIPS
> > > > > 64 platform. The CEPH monitor is also able to detect the OSD, 
> > > > > but not up yet, as the osd activate failed.
> > > > > Since we don?t have the required CEPH deploy utility for 
> > > > > MIPS64, we are following the manual procedure to create and activate an OSD.
> > > > > We have disabled authentication between the clients and the 
> > > > > OSD?s for now.
> > > > >
> > > > > Has any body tried CEPH on a MIPS64?
> > > > > /dev/sda is a 2TB local hard drive.
> > > > >
> > > > > This is how my partition looks after ceph-disk-prepare 
> > > > > /home/prashan/ceph-0.86/src# parted GNU Parted 2.3 Using 
> > > > > /dev/sda Welcome to GNU Parted! Type 'help' to view a list of commands.
> > > > > (parted) p
> > > > > Model: ATA TOSHIBA MQ01ABB2 (scsi) Disk /dev/sda: 2000GB 
> > > > > Sector size (logical/physical): 512B/4096B Partition Table:
> > > > > gpt
> > > > >
> > > > > Number  Start   End     Size    File system  Name          Flags
> > > > >  2      1049kB  5369MB  5368MB               ceph journal
> > > > >  1      5370MB  2000GB  1995GB  xfs          ceph data
> > > > >
> > > > >
> > > > >
> > > > > The following are the steps to create an OSD
> > > > > 1)	ceph-disk zap /dev/sda
> > > > > 2)	ceph-disk-prepare --cluster  f615496c-b40a-4905-bbcd-
> > > > > 2d3e181ff21a --fs-type xfs /dev/sda
> > > > > 3)	mount /dev/sda1 /var/lib/ceph/osd/ceph-0/
> > > > > 4)	ceph-osd -i 0 ?mkfs is giving an error ,
> > > > > filestore(/var/lib/ceph/osd/ceph-0) could not find
> > > > > 23c2fcde/osd_superblock/0//-1 in index: (2) No such file.
> > > > > After this it segfaults. We have analyzed this further with 
> > > > > the help of strace and root caused this as objectmap file reading issue.
> > > > > open("/var/lib/ceph/osd/ceph-0/current/omap/000005.log",
> > > > > O_RDONLY)
> > > =
> > > > > 11, the first time it reads 32k, the read succeeds with 63 
> > > > > bytes and it tries to read again with 27k and the read returns
> > > > > 0 bytes and the CEPH osd segfaults.
> > > >
> > > > Can you generate a full log with --debug-osd 20 
> > > > --debug-filestore
> > > > 20 --
> > > debug-jouranl 20 passed to ceph-osd --mkfs and post that somewhere?  
> > > It should tell us where things are going wrong.  In particular, we 
> > > want to see if that file/object is being written properly.  It 
> > > will also have a backtrace showing exactly where it crashed.
> > > >
> > > > > Please note that ceph-disk prepare creates a journal in a path 
> > > > > which is not
> > > > > valid(dev/disk/by-partuuid/cbd4a5d1-012f-4863-b492-
> > > 080ad2a505cb).
> > > > > So after step3 above I remove this journal below and manually 
> > > > > create a journal file before doing step4 above.
> > > > >
> > > > >
> > > > > ls -l /var/lib/ceph/osd/ceph-0/ total 16
> > > > > -rw-r--r-- 1 root root 37 Oct 22 21:40 ceph_fsid
> > > > > -rw-r--r-- 1 root root 37 Oct 22 21:40 fsid lrwxrwxrwx 1 root 
> > > > > root
> > > > > 58 Oct 22 21:40 journal -> /dev/disk/by- 
> > > > > partuuid/cbd4a5d1-012f-4863-b492-080ad2a505cb
> > > >
> > > > Is there anything in /dev/disk/by-partuuid/ or is it missing entirely?
> > > > Maybe you have an old udev.  What distro is this?
> > > >
> > > > sage
> > > >
> > > > > -rw-r--r-- 1 root root 37 Oct 22 21:40 journal_uuid
> > > > > -rw-r--r-- 1 root root 21 Oct 22 21:40 magic
> > > > >
> > > > > Any pointers to move ahead will be greatly appreciated??
> > > > >
> > > > > thanks
> > > > > Prashanth
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > > > > in the body of a message to majordomo@xxxxxxxxxxxxxxx More
> > > majordomo
> > > > > info at  http://vger.kernel.org/majordomo-info.html
> > > > >
> > > > >
> > > >
> > 
> > 
> 
Attachment:
ceph.conf

Description: ceph.conf
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
root default {
	id -1		# do not change unnecessarily
	# weight 0.000
	alg straw
	hash 0	# rjenkins1
}
host octeon1 {
	id -2		# do not change unnecessarily
	# weight 4.000
	alg straw
	hash 0	# rjenkins1
	item osd.0 weight 1.000
	item osd.5 weight 1.000
	item osd.6 weight 1.000
	item osd.7 weight 1.000
}
host octeon {
	id -3		# do not change unnecessarily
	# weight 4.000
	alg straw
	hash 0	# rjenkins1
	item osd.1 weight 1.000
	item osd.2 weight 1.000
	item osd.3 weight 1.000
	item osd.4 weight 1.000
}

# rules
rule replicated_ruleset {
	ruleset 0
	type replicated
	min_size 1
	max_size 10
	step take default
	step chooseleaf firstn 0 type host
	step emit
}

# end crush map
Attachment:
crush_dump

Description: crush_dump
Attachment:
dump_pgs

Description: dump_pgs
Attachment:
health_detail

Description: health_detail
# id    weight  type name       up/down reweight
-3      4       host octeon
1       1               osd.1   up      1
2       1               osd.2   up      1
3       1               osd.3   up      1
4       1               osd.4   up      1
-2      4       host octeon1
0       1               osd.0   up      1
5       1               osd.5   up      1
6       1               osd.6   up      1
7       1               osd.7   up      1
-1      0       root default
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com