You are right, using kill on osd process does call to shutdown -> umount I think some order problem in stop.sh. -Regards, Ramesh > -----Original Message----- > From: Somnath Roy > Sent: Monday, July 11, 2016 12:44 PM > To: Ramesh Chander; Sage Weil > Cc: Brad Hubbard; ceph-devel@xxxxxxxxxxxxxxx > Subject: RE: SIGTERM and osd close > > Communication to monitor is not happening it seems, it is stuck there.. Is > your monitor running ? > I am not super familiar with vstart as I rarely use it, but, if I can remember > correctly, stop.sh stops all the services including mon. > I can see, store->umount is been called from osd::shutdown(), so, it should > clean up properly. > Without running stop.sh , if you send a kill signal (without -9 of course) to the > osd , it should execute shutdown properly. > > Thanks & Regards > Somnath > > -----Original Message----- > From: Ramesh Chander > Sent: Sunday, July 10, 2016 11:42 PM > To: Sage Weil > Cc: Brad Hubbard; Somnath Roy; ceph-devel@xxxxxxxxxxxxxxx > Subject: RE: SIGTERM and osd close > > I tried to trace down shutdown call with signal SIGTERM in osd. > > It seems shutdown call never reached BlueStore::umount. > > Steps: > > 1. Starst osd. > 2. Attached gdb and put breakpoints: > (gdb) info b > Num Type Disp Enb Address What > 1 breakpoint keep y 0x00007f5f8ba5e8a0 in OSD::shutdown() at > osd/OSD.cc:2599 > 2 breakpoint keep y 0x00007f5f8bd65160 in > BlueStore::umount() at os/bluestore/BlueStore.cc:2686 > > 3. Trigger stop.sh > > Breakpoint 1 is hit but it never hits second breakpoints. It get stuck > somewhere in call: > > #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 > #1 0x00007f5f8ba515b5 in WaitUntil (when=..., mutex=..., > this=0x7f5f95428e20) at ./common/Cond.h:72 > #2 OSDService::prepare_to_stop (this=this@entry=0x7f5f954275c8) at > osd/OSD.cc:1174 > #3 0x00007f5f8ba5e8cb in OSD::shutdown > (this=this@entry=0x7f5f95426000) at osd/OSD.cc:2600 > #4 0x00007f5f8ba604d0 in OSD::handle_signal (this=0x7f5f95426000, > signum=<optimized out>) at osd/OSD.cc:1739 > #5 0x00007f5f8c0209b7 in SignalHandler::entry (this=0x7f5f952a8560) at > global/signal_handler.cc:252 > #6 0x00007f5f89f19182 in start_thread (arg=0x7f5f686d7700) at > pthread_create.c:312 > #7 0x00007f5f87e2c00d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > > Any idea what is happening? I think it working with recovery in place but > never does graceful shutdown? > > Or I am missing anything here? > > > -Ramesh > > > > -----Original Message----- > > From: Sage Weil [mailto:sweil@xxxxxxxxxx] > > Sent: Friday, July 01, 2016 7:32 PM > > To: Ramesh Chander > > Cc: Brad Hubbard; Somnath Roy; ceph-devel@xxxxxxxxxxxxxxx > > Subject: RE: SIGTERM and osd close > > > > On Fri, 1 Jul 2016, Ramesh Chander wrote: > > > Thank you all for reply, > > > > > > Brad, > > > > > > I should trace the code path you pointed out. > > > > In this case, the important bit is BlueFS::umount(), which calls > > BlueFS::_stop_alloc(). BlueStore::_close_db() should be calling > > bluefs->umount(). Any of the unit tests should be triggering these > > bluefs->code > > paths. > > > > sage > > > > > > > > > > -Regards, > > > Ramesh > > > > > > > -----Original Message----- > > > > From: Brad Hubbard [mailto:bhubbard@xxxxxxxxxx] > > > > Sent: Friday, July 01, 2016 3:48 AM > > > > To: Somnath Roy > > > > Cc: Ramesh Chander; ceph-devel@xxxxxxxxxxxxxxx > > > > Subject: Re: SIGTERM and osd close > > > > > > > > On Fri, Jul 1, 2016 at 4:44 AM, Somnath Roy > > <Somnath.Roy@xxxxxxxxxxx> > > > > wrote: > > > > > You need to call it from BlueStore::umount() I guess for cleanup > work.. > > > > > > > > > > -----Original Message----- > > > > > From: ceph-devel-owner@xxxxxxxxxxxxxxx > > > > > [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Ramesh > > > > Chander > > > > > Sent: Thursday, June 30, 2016 8:46 AM > > > > > To: ceph-devel@xxxxxxxxxxxxxxx > > > > > Subject: SIGTERM and osd close > > > > > > > > > > Hi All, > > > > > > > > > > When I use stop.sh without any argument, I suppose it calls > > > > > pkill with > > > > SIGTERM on osds as well as other processes. > > > > > > > > 616 // install signal handlers > > > > 617 init_async_signal_handler(); > > > > 618 register_async_signal_handler(SIGHUP, sighup_handler); > > > > 619 register_async_signal_handler_oneshot(SIGINT, > > handle_osd_signal); > > > > 620 register_async_signal_handler_oneshot(SIGTERM, > > handle_osd_signal); > > > > > > > > 65 void handle_osd_signal(int signum) > > > > 66 { > > > > 67 if (osd) > > > > 68 osd->handle_signal(signum); > > > > 69 } > > > > > > > > 1735 void OSD::handle_signal(int signum) > > > > 1736 { > > > > 1737 assert(signum == SIGINT || signum == SIGTERM); > > > > 1738 derr << "*** Got signal " << sig_str(signum) << " ***" << dendl; > > > > 1739 shutdown(); > > > > 1740 } > > > > > > > > 2598 int OSD::shutdown() > > > > 2599 { > > > > > > > > OSD::shutdown() in src/osd/OSD.cc is quite a large function that > > > > performs quite a bit of clean up such as draining and shutting > > > > down thread pool > > work > > > > queues, shutting down messenger instances, un-registering admin > > > > commands, shutting down the PGs, flushing outstanding ops, > > > > updating > > the > > > > superblock and unmounting the filestore (as Somnath mentioned this > > might > > > > be where you want to look), shutting down the MON client and > > > > clearing > > the > > > > peering work queue, in no particular order. > > > > > > > > So there is no doubt the OSD (and other daemons such as MON and > > MDS) > > > > intercepts this signal and performs a graceful shutdown including > > > > many housekeeping tasks. > > > > > > > > HTH, > > > > Brad > > > > > > > > > > > > > > Does osd handle this signal and take care of closing all components? > > > > > > > > > > I am specifically interested in if it closes objectstore -> keyvaluedb . > > > > > > > > > > I don't see my code of keyvaluedb shutdown/close being called > > > > > when I do ./stop.sh > > > > > > > > > > Any argument or way to force this? > > > > > > > > > > -Ramesh > > > > > PLEASE NOTE: The information contained in this electronic mail > > > > > message > > is > > > > intended only for the use of the designated recipient(s) named > > > > above. If > > the > > > > reader of this message is not the intended recipient, you are > > > > hereby > > notified > > > > that you have received this message in error and that any review, > > > > dissemination, distribution, or copying of this message is > > > > strictly > > prohibited. If > > > > you have received this communication in error, please notify the > > > > sender > > by > > > > telephone or e-mail (as shown above) immediately and destroy any > > > > and > > all > > > > copies of this message in your possession (whether hard copies or > > > > electronically stored copies). > > > > > -- > > > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > > > > > in the body of a message to majordomo@xxxxxxxxxxxxxxx More > > > > majordomo > > > > > info at http://vger.kernel.org/majordomo-info.html > > > > > -- > > > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > > > > > in the body of a message to majordomo@xxxxxxxxxxxxxxx More > > > > majordomo > > > > > info at http://vger.kernel.org/majordomo-info.html > > > > > > > > > > > > > > > > -- > > > > Cheers, > > > > Brad > > > PLEASE NOTE: The information contained in this electronic mail > > > message is > > intended only for the use of the designated recipient(s) named above. > > If the reader of this message is not the intended recipient, you are > > hereby notified that you have received this message in error and that > > any review, dissemination, distribution, or copying of this message is > > strictly prohibited. If you have received this communication in error, > > please notify the sender by telephone or e-mail (as shown above) > > immediately and destroy any and all copies of this message in your > > possession (whether hard copies or electronically stored copies). > > > N?????r??y??????X??ǧv???){.n?????z?]z????ay?ʇڙ??j > > ??f???h??????w??? > > > > ???j:+v???w???????? ????zZ+???????j"????i PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f