On Fri, 2011-02-18 at 00:13 -0700, Sage Weil wrote: > On Thu, 17 Feb 2011, Jim Schutt wrote: > > Check out the result: > > > > osd.68.log:256027:2011-02-17 15:44:50.141378 7fd42ad57940 osd68 5 tick > > osd.68.log:256028:2011-02-17 15:44:50.141464 7fd42ad57940 osd68 5 tick getting read lock on map_lock > > osd.68.log:256029:2011-02-17 15:44:50.141472 7fd42ad57940 osd68 5 tick got read lock on map_lock > > osd.68.log:256031:2011-02-17 15:44:50.141612 7fd42ad57940 osd68 5 tick sending mon report > > osd.68.log:256032:2011-02-17 15:44:50.141619 7fd42ad57940 osd68 5 tick removing stray pgs > > osd.68.log:256033:2011-02-17 15:44:50.141626 7fd42ad57940 osd68 5 tick sending log to logclient > > osd.68.log:256034:2011-02-17 15:44:50.141633 7fd42ad57940 osd68 5 tick arming timer for next tick <== > > osd.68.log:256277:2011-02-17 15:45:18.481656 7fd42ad57940 osd68 5 tick checking dispatch queue status <== 28 second gap > > osd.68.log:256278:2011-02-17 15:45:18.481669 7fd42ad57940 osd68 5 tick done > > osd.68.log:256279:2011-02-17 15:45:18.481691 7fd42ad57940 osd68 5 tick > > osd.68.log:256280:2011-02-17 15:45:18.481705 7fd42ad57940 osd68 5 tick getting read lock on map_lock > > osd.68.log:256281:2011-02-17 15:45:18.481712 7fd42ad57940 osd68 5 tick got read lock on map_lock > > osd.68.log:256688:2011-02-17 15:45:20.010705 7fd42ad57940 osd68 5 tick sending mon report > > osd.68.log:256753:2011-02-17 15:45:20.012950 7fd42ad57940 osd68 5 tick removing stray pgs > > osd.68.log:256754:2011-02-17 15:45:20.012959 7fd42ad57940 osd68 5 tick sending log to logclient > > osd.68.log:256755:2011-02-17 15:45:20.012965 7fd42ad57940 osd68 5 tick arming timer for next tick > > osd.68.log:256756:2011-02-17 15:45:20.012976 7fd42ad57940 osd68 5 tick checking dispatch queue status > > osd.68.log:256757:2011-02-17 15:45:20.012993 7fd42ad57940 osd68 5 tick done > > > > Why should it take 28 seconds to add a new timer event? > > Huh.. that is pretty weird. I see multiple sync in there, too, so it's > not like something was somehow blocking on a btrfs commit. > > Anybody else have ideas? :/ So I thought I'd add debug timer = 20 to the [osd] section in my config. Rebuilding my filesystem resulted in this: importing contents of /tmp/keyring.mds.an15 into /tmp/monkeyring.10158 creating /tmp/keyring.osd.0 cauthtool: ./common/Mutex.h:118: void Mutex::Lock(bool): Assertion `r == 0' failed. *** Caught signal (Aborted) *** in thread 7f6e8f29f6f0 ceph version 0.24.3 (commit:97d65f2cbca2046fa9a014fcdff1628641d43fd8) 1: (ceph::BackTrace::BackTrace(int)+0x2a) [0x45cd8c] 2: /usr/bin/cauthtool [0x466a9d] 3: /lib64/libpthread.so.0 [0x7f6e8ee93b10] 4: (gsignal()+0x35) [0x7f6e8dcfe265] 5: (abort()+0x110) [0x7f6e8dcffd10] 6: (__assert_fail()+0xf6) [0x7f6e8dcf76e6] 7: (Mutex::Lock(bool)+0x88) [0x45b5e6] 8: /usr/bin/cauthtool [0x45ba1e] 9: (operator<<(std::ostream&, _dbeginl_t)+0x11) [0x45c733] a: (SafeTimer::shutdown()+0x2a) [0x484d6e] b: /usr/bin/cauthtool [0x480a0b] c: (FlusherStopper::~FlusherStopper()+0x11) [0x48322d] d: /usr/bin/cauthtool [0x480a2e] e: (exit()+0xe5) [0x7f6e8dd013a5] f: (__libc_start_main()+0xfb) [0x7f6e8dceb99b] 10: (__gxx_personality_v0()+0x2d1) [0x44ce09] I guess I'll rebuild it without timer debugging, then start it with timer debugging, and see what happens. -- Jim > > sage > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html