RE: SIGTERM and osd close

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Communication to monitor is not happening it seems, it is stuck there.. Is your monitor running ?
I am not super familiar with vstart as I rarely use it, but, if I can remember correctly, stop.sh stops all the services including mon.
I can see, store->umount is been called from osd::shutdown(), so, it should clean up properly.
Without running stop.sh , if you send a kill signal (without -9 of course) to the osd , it should execute shutdown properly.

Thanks & Regards
Somnath

-----Original Message-----
From: Ramesh Chander
Sent: Sunday, July 10, 2016 11:42 PM
To: Sage Weil
Cc: Brad Hubbard; Somnath Roy; ceph-devel@xxxxxxxxxxxxxxx
Subject: RE: SIGTERM and osd close

I tried to trace down shutdown call with signal SIGTERM in osd.

It seems shutdown call never reached BlueStore::umount.

Steps:

1. Starst osd.
2. Attached gdb and put breakpoints:
(gdb) info b
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00007f5f8ba5e8a0 in OSD::shutdown() at osd/OSD.cc:2599
2       breakpoint     keep y   0x00007f5f8bd65160 in BlueStore::umount() at os/bluestore/BlueStore.cc:2686

3. Trigger stop.sh

Breakpoint 1 is hit but it never hits second breakpoints. It get stuck somewhere in call:

#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f5f8ba515b5 in WaitUntil (when=..., mutex=..., this=0x7f5f95428e20) at ./common/Cond.h:72
#2  OSDService::prepare_to_stop (this=this@entry=0x7f5f954275c8) at osd/OSD.cc:1174
#3  0x00007f5f8ba5e8cb in OSD::shutdown (this=this@entry=0x7f5f95426000) at osd/OSD.cc:2600
#4  0x00007f5f8ba604d0 in OSD::handle_signal (this=0x7f5f95426000, signum=<optimized out>) at osd/OSD.cc:1739
#5  0x00007f5f8c0209b7 in SignalHandler::entry (this=0x7f5f952a8560) at global/signal_handler.cc:252
#6  0x00007f5f89f19182 in start_thread (arg=0x7f5f686d7700) at pthread_create.c:312
#7  0x00007f5f87e2c00d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111


Any idea what is happening? I think it working with recovery in place but never does graceful shutdown?

Or I am missing anything here?


-Ramesh


> -----Original Message-----
> From: Sage Weil [mailto:sweil@xxxxxxxxxx]
> Sent: Friday, July 01, 2016 7:32 PM
> To: Ramesh Chander
> Cc: Brad Hubbard; Somnath Roy; ceph-devel@xxxxxxxxxxxxxxx
> Subject: RE: SIGTERM and osd close
>
> On Fri, 1 Jul 2016, Ramesh Chander wrote:
> > Thank you all for reply,
> >
> > Brad,
> >
> > I should trace the code path you pointed out.
>
> In this case, the important bit is BlueFS::umount(), which calls
> BlueFS::_stop_alloc().  BlueStore::_close_db() should be calling
> bluefs->umount().  Any of the unit tests should be triggering these
> bluefs->code
> paths.
>
> sage
>
>
> >
> > -Regards,
> > Ramesh
> >
> > > -----Original Message-----
> > > From: Brad Hubbard [mailto:bhubbard@xxxxxxxxxx]
> > > Sent: Friday, July 01, 2016 3:48 AM
> > > To: Somnath Roy
> > > Cc: Ramesh Chander; ceph-devel@xxxxxxxxxxxxxxx
> > > Subject: Re: SIGTERM and osd close
> > >
> > > On Fri, Jul 1, 2016 at 4:44 AM, Somnath Roy
> <Somnath.Roy@xxxxxxxxxxx>
> > > wrote:
> > > > You need to call it from BlueStore::umount() I guess for cleanup work..
> > > >
> > > > -----Original Message-----
> > > > From: ceph-devel-owner@xxxxxxxxxxxxxxx
> > > > [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Ramesh
> > > Chander
> > > > Sent: Thursday, June 30, 2016 8:46 AM
> > > > To: ceph-devel@xxxxxxxxxxxxxxx
> > > > Subject: SIGTERM and osd close
> > > >
> > > > Hi All,
> > > >
> > > > When I use stop.sh without any argument, I suppose it calls
> > > > pkill with
> > > SIGTERM on osds as well as other processes.
> > >
> > > 616   // install signal handlers
> > > 617   init_async_signal_handler();
> > > 618   register_async_signal_handler(SIGHUP, sighup_handler);
> > > 619   register_async_signal_handler_oneshot(SIGINT,
> handle_osd_signal);
> > > 620   register_async_signal_handler_oneshot(SIGTERM,
> handle_osd_signal);
> > >
> > >  65 void handle_osd_signal(int signum)
> > >  66 {
> > >  67   if (osd)
> > >  68     osd->handle_signal(signum);
> > >  69 }
> > >
> > > 1735 void OSD::handle_signal(int signum)
> > > 1736 {
> > > 1737   assert(signum == SIGINT || signum == SIGTERM);
> > > 1738   derr << "*** Got signal " << sig_str(signum) << " ***" << dendl;
> > > 1739   shutdown();
> > > 1740 }
> > >
> > > 2598 int OSD::shutdown()
> > > 2599 {
> > >
> > > OSD::shutdown() in src/osd/OSD.cc is quite a large function that
> > > performs quite a bit of clean up such as draining and shutting
> > > down thread pool
> work
> > > queues, shutting down messenger instances, un-registering admin
> > > commands, shutting down the PGs, flushing outstanding ops,
> > > updating
> the
> > > superblock and unmounting the filestore (as Somnath mentioned this
> might
> > > be where you want to look), shutting down the MON client and
> > > clearing
> the
> > > peering work queue, in no particular order.
> > >
> > > So there is no doubt the OSD (and other daemons such as MON and
> MDS)
> > > intercepts this signal and performs a graceful shutdown including
> > > many housekeeping tasks.
> > >
> > > HTH,
> > > Brad
> > >
> > > >
> > > > Does osd handle this signal and take care of closing all components?
> > > >
> > > > I am specifically interested in if it closes objectstore -> keyvaluedb .
> > > >
> > > > I don't see my code of keyvaluedb shutdown/close being called
> > > > when I do ./stop.sh
> > > >
> > > > Any argument or way to force this?
> > > >
> > > > -Ramesh
> > > > PLEASE NOTE: The information contained in this electronic mail
> > > > message
> is
> > > intended only for the use of the designated recipient(s) named
> > > above. If
> the
> > > reader of this message is not the intended recipient, you are
> > > hereby
> notified
> > > that you have received this message in error and that any review,
> > > dissemination, distribution, or copying of this message is
> > > strictly
> prohibited. If
> > > you have received this communication in error, please notify the
> > > sender
> by
> > > telephone or e-mail (as shown above) immediately and destroy any
> > > and
> all
> > > copies of this message in your possession (whether hard copies or
> > > electronically stored copies).
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > > > in the body of a message to majordomo@xxxxxxxxxxxxxxx More
> > > majordomo
> > > > info at  http://vger.kernel.org/majordomo-info.html
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > > > in the body of a message to majordomo@xxxxxxxxxxxxxxx More
> > > majordomo
> > > > info at  http://vger.kernel.org/majordomo-info.html
> > >
> > >
> > >
> > > --
> > > Cheers,
> > > Brad
> > PLEASE NOTE: The information contained in this electronic mail
> > message is
> intended only for the use of the designated recipient(s) named above.
> If the reader of this message is not the intended recipient, you are
> hereby notified that you have received this message in error and that
> any review, dissemination, distribution, or copying of this message is
> strictly prohibited. If you have received this communication in error,
> please notify the sender by telephone or e-mail (as shown above)
> immediately and destroy any and all copies of this message in your
> possession (whether hard copies or electronically stored copies).
> > N?????r??y??????X??ǧv???)޺{.n?????z?]z????ay?ʇڙ??j
> ??f???h??????w???
>
> ???j:+v???w???????? ????zZ+???????j"????i
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux