On Wed, May 25, 2016 at 3:00 PM, Mathias Buresch <mathias.buresch@xxxxxxxxxxxx> wrote: > I don't know what exactly is segfaulting. > > Here ist the output with command line flags and gdb (I can't really > notice erros in that output): > > # ceph -s --debug-monc=20 --debug-ms=20 > 2016-05-25 14:51:02.406135 7f188300a700 10 monclient(hunting): > build_initial_monmap > 2016-05-25 14:51:02.406444 7f188300a700 10 -- :/0 ready :/0 > 2016-05-25 14:51:02.407214 7f188300a700 1 -- :/0 messenger.start > 2016-05-25 14:51:02.407261 7f188300a700 10 monclient(hunting): init > 2016-05-25 14:51:02.407291 7f188300a700 10 monclient(hunting): > auth_supported 2 method cephx > 2016-05-25 14:51:02.407312 7f187b7fe700 10 -- :/2987460054 reaper_entry > start > 2016-05-25 14:51:02.407380 7f187b7fe700 10 -- :/2987460054 reaper > 2016-05-25 14:51:02.407383 7f187b7fe700 10 -- :/2987460054 reaper done > 2016-05-25 14:51:02.407638 7f188300a700 10 monclient(hunting): > _reopen_session rank -1 name > 2016-05-25 14:51:02.407646 7f188300a700 10 -- :/2987460054 connect_rank > to 62.176.141.181:6789/0, creating pipe and registering > 2016-05-25 14:51:02.407686 7f188300a700 10 -- :/2987460054 >> > 62.176.141.181:6789/0 pipe(0x7f187c064010 sd=-1 :0 s=1 pgs=0 cs=0 l=1 > c=0x7f187c05aa50).register_pipe > 2016-05-25 14:51:02.407698 7f188300a700 10 -- :/2987460054 > get_connection mon.0 62.176.141.181:6789/0 new 0x7f187c064010 > 2016-05-25 14:51:02.407693 7f1879ffb700 10 -- :/2987460054 >> > 62.176.141.181:6789/0 pipe(0x7f187c064010 sd=-1 :0 s=1 pgs=0 cs=0 l=1 > c=0x7f187c05aa50).writer: state = connecting policy.server=0 > 2016-05-25 14:51:02.407723 7f1879ffb700 10 -- :/2987460054 >> > 62.176.141.181:6789/0 pipe(0x7f187c064010 sd=-1 :0 s=1 pgs=0 cs=0 l=1 > c=0x7f187c05aa50).connect 0 > 2016-05-25 14:51:02.407738 7f188300a700 10 monclient(hunting): picked > mon.pix01 con 0x7f187c05aa50 addr 62.176.141.181:6789/0 > 2016-05-25 14:51:02.407745 7f188300a700 20 -- :/2987460054 > send_keepalive con 0x7f187c05aa50, have pipe. > 2016-05-25 14:51:02.407744 7f1879ffb700 10 -- :/2987460054 >> > 62.176.141.181:6789/0 pipe(0x7f187c064010 sd=3 :0 s=1 pgs=0 cs=0 l=1 > c=0x7f187c05aa50).connecting to 62.176.141.181:6789/0 > 2016-05-25 14:51:02.407759 7f188300a700 10 monclient(hunting): > _send_mon_message to mon.pix01 at 62.176.141.181:6789/0 > 2016-05-25 14:51:02.407763 7f188300a700 1 -- :/2987460054 --> > 62.176.141.181:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 > 0x7f187c060380 con 0x7f187c05aa50 > 2016-05-25 14:51:02.407768 7f188300a700 20 -- :/2987460054 > submit_message auth(proto 0 30 bytes epoch 0) v1 remote, > 62.176.141.181:6789/0, have pipe. > 2016-05-25 14:51:02.407773 7f188300a700 10 monclient(hunting): > renew_subs > 2016-05-25 14:51:02.407777 7f188300a700 10 monclient(hunting): > authenticate will time out at 2016-05-25 14:56:02.407777 > 2016-05-25 14:51:02.408128 7f1879ffb700 20 -- :/2987460054 >> > 62.176.141.181:6789/0 pipe(0x7f187c064010 sd=3 :37964 s=1 pgs=0 cs=0 > l=1 c=0x7f187c05aa50).connect read peer addr 62.176.141.181:6789/0 on > socket 3 > 2016-05-25 14:51:02.408144 7f1879ffb700 20 -- :/2987460054 >> > 62.176.141.181:6789/0 pipe(0x7f187c064010 sd=3 :37964 s=1 pgs=0 cs=0 > l=1 c=0x7f187c05aa50).connect peer addr for me is > 62.176.141.181:37964/0 > 2016-05-25 14:51:02.408148 7f1879ffb700 1 -- > 62.176.141.181:0/2987460054 learned my addr 62.176.141.181:0/2987460054 > 2016-05-25 14:51:02.408188 7f1879ffb700 10 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=1 pgs=0 cs=0 l=1 > c=0x7f187c05aa50).connect sent my addr 62.176.141.181:0/2987460054 > 2016-05-25 14:51:02.408197 7f1879ffb700 10 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=1 pgs=0 cs=0 l=1 > c=0x7f187c05aa50).connect sending gseq=1 cseq=0 proto=15 > 2016-05-25 14:51:02.408207 7f1879ffb700 20 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=1 pgs=0 cs=0 l=1 > c=0x7f187c05aa50).connect wrote (self +) cseq, waiting for reply > 2016-05-25 14:51:02.408259 7f1879ffb700 20 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=1 pgs=0 cs=0 l=1 > c=0x7f187c05aa50).connect got reply tag 1 connect_seq 1 global_seq > 327710 proto 15 flags 1 features 55169095435288575 > 2016-05-25 14:51:02.408269 7f1879ffb700 10 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).connect success 1, lossy = 1, features > 55169095435288575 > 2016-05-25 14:51:02.408280 7f1879ffb700 20 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).connect starting reader > 2016-05-25 14:51:02.408325 7f1879ffb700 10 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).writer: state = open policy.server=0 > 2016-05-25 14:51:02.408343 7f1879ffb700 10 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).write_keepalive2 14 2016-05-25 14:51:02.408342 > 2016-05-25 14:51:02.408378 7f1879ffb700 20 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).writer encoding 1 features 55169095435288575 > 0x7f187c060380 auth(proto 0 30 bytes epoch 0) v1 > 2016-05-25 14:51:02.408356 7f1879efa700 20 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).reader reading tag... > 2016-05-25 14:51:02.408406 7f1879ffb700 20 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).writer no session security > 2016-05-25 14:51:02.408415 7f1879ffb700 20 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).writer sending 1 0x7f187c060380 > 2016-05-25 14:51:02.408453 7f1879ffb700 10 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).writer: state = open policy.server=0 > 2016-05-25 14:51:02.408455 7f1879efa700 20 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).reader got KEEPALIVE_ACK > 2016-05-25 14:51:02.408463 7f1879ffb700 20 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).writer sleeping > 2016-05-25 14:51:02.408482 7f1879efa700 20 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).reader reading tag... > 2016-05-25 14:51:02.408696 7f1879efa700 20 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).reader got ACK > 2016-05-25 14:51:02.408713 7f1879efa700 15 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).reader got ack seq 1 > 2016-05-25 14:51:02.408721 7f1879efa700 20 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).reader reading tag... > 2016-05-25 14:51:02.408732 7f1879efa700 20 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).reader got MSG > 2016-05-25 14:51:02.408739 7f1879efa700 20 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).reader got envelope type=4 src mon.0 front=340 data=0 > off 0 > 2016-05-25 14:51:02.408751 7f1879efa700 10 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).reader wants 340 from dispatch throttler 0/104857600 > 2016-05-25 14:51:02.408763 7f1879efa700 20 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).reader got front 340 > 2016-05-25 14:51:02.408770 7f1879efa700 10 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).aborted = 0 > 2016-05-25 14:51:02.408776 7f1879efa700 20 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).reader got 340 + 0 + 0 byte message > 2016-05-25 14:51:02.408801 7f1879efa700 10 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).No session security set > 2016-05-25 14:51:02.408813 7f1879efa700 10 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).reader got message 1 0x7f186c001cb0 mon_map magic: 0 > v1 > 2016-05-25 14:51:02.408827 7f1879efa700 20 -- > 62.176.141.181:0/2987460054 queue 0x7f186c001cb0 prio 196 > 2016-05-25 14:51:02.408837 7f1879efa700 20 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).reader reading tag... > 2016-05-25 14:51:02.408851 7f1879ffb700 10 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).writer: state = open policy.server=0 > Segmentation fault > > > (gdb) run /usr/bin/ceph status > Starting program: /usr/bin/python /usr/bin/ceph status > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib/x86_64-linux- > gnu/libthread_db.so.1". > [New Thread 0x7ffff10f5700 (LWP 23401)] > [New Thread 0x7ffff08f4700 (LWP 23402)] > [Thread 0x7ffff10f5700 (LWP 23401) exited] > [New Thread 0x7ffff10f5700 (LWP 23403)] > [Thread 0x7ffff10f5700 (LWP 23403) exited] > [New Thread 0x7ffff10f5700 (LWP 23404)] > [Thread 0x7ffff10f5700 (LWP 23404) exited] > [New Thread 0x7ffff10f5700 (LWP 23405)] > [Thread 0x7ffff10f5700 (LWP 23405) exited] > [New Thread 0x7ffff10f5700 (LWP 23406)] > [Thread 0x7ffff10f5700 (LWP 23406) exited] > [New Thread 0x7ffff10f5700 (LWP 23407)] > [Thread 0x7ffff10f5700 (LWP 23407) exited] > [New Thread 0x7ffff10f5700 (LWP 23408)] > [New Thread 0x7fffeb885700 (LWP 23409)] > [New Thread 0x7fffeb084700 (LWP 23410)] > [New Thread 0x7fffea883700 (LWP 23411)] > [New Thread 0x7fffea082700 (LWP 23412)] > [New Thread 0x7fffe9881700 (LWP 23413)] > [New Thread 0x7fffe9080700 (LWP 23414)] > [New Thread 0x7fffe887f700 (LWP 23415)] > [New Thread 0x7fffe807e700 (LWP 23416)] > [New Thread 0x7fffe7f7d700 (LWP 23419)] > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7fffea883700 (LWP 23411)] > 0x00007ffff3141a57 in ?? () from /usr/lib/librados.so.2 > (gdb) bt > #0 0x00007ffff3141a57 in ?? () from /usr/lib/librados.so.2 > #1 0x00007ffff313aff4 in ?? () from /usr/lib/librados.so.2 > #2 0x00007ffff2fe4a79 in ?? () from /usr/lib/librados.so.2 > #3 0x00007ffff2fe6507 in ?? () from /usr/lib/librados.so.2 > #4 0x00007ffff30d5dc9 in ?? () from /usr/lib/librados.so.2 > #5 0x00007ffff31023bd in ?? () from /usr/lib/librados.so.2 > #6 0x00007ffff7bc4182 in start_thread () from /lib/x86_64-linux- > gnu/libpthread.so.0 > #7 0x00007ffff78f147d in clone () from /lib/x86_64-linux-gnu/libc.so.6 > > > Does that help? I cant really see where the error is. :) Hmm, can you try getting that backtrace again after installing the ceph-debuginfo package? Also add --debug-rados=20 to your command line (you can use all the --debug... options when you're running inside gdb to get the logs and the backtrace in one). John > > -----Original Message----- > From: John Spray <jspray@xxxxxxxxxx> > To: Mathias Buresch <mathias.buresch@xxxxxxxxxxxx> > Cc: ceph-users@xxxxxxxx <ceph-users@xxxxxxxx> > Subject: Re: Ceph Status - Segmentation Fault > Date: Wed, 25 May 2016 10:16:55 +0100 > > On Mon, May 23, 2016 at 12:41 PM, Mathias Buresch > <mathias.buresch@xxxxxxxxxxxx> wrote: >> >> Please found the logs with higher debug level attached to this email. > You've attached the log from your mon, but it's not your mon that's > segfaulting, right? > > You can use normal ceph command line flags to crank up the verbosity > on the CLI too (--debug-monc=20 --debug-ms=20 spring to mind). > > You can also run the ceph CLI in gdb like this: > gdb python > (gdb) run /usr/bin/ceph status > ... hopefully it crashes and then ... > (gdb) bt > > Cheers, > John > >> >> >> >> Kind regards >> Mathias >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com