On Wed, 2011-02-23 at 11:12 -0700, Gregory Farnum wrote: > On Wednesday, February 23, 2011 at 9:52 AM, Jim Schutt wrote: > > > > On Thu, 2011-02-17 at 09:11 -0700, Sage Weil wrote: > > > On Thu, 17 Feb 2011, Jim Schutt wrote: > > > > Hi Sage, > > > > > > > > On Wed, 2011-02-16 at 17:54 -0700, Sage Weil wrote: > > > > > On Wed, 16 Feb 2011, Sage Weil wrote: > > > > > > shouldn't affect anything. We may have missed something.. do you have a > > > > > > log showing this in action? > > > > > > > > > > Obviously yes, looking at your original email. :) At the beginning of > > > > > each log line we include a thread id. What would be really helpful would > > > > > be to narrow down where in OSD::heartbeat_entry() and heartbeat() things > > > > > are blocking, either based on the existing output, or by adding additional > > > > > dout lines at interesting points in time. > > > > > > > > I'll take a deeper look at my existing logs with > > > > that in mind; let me know if you'd like me to > > > > send you some. > > > > > > > > I have also been looking at map_lock, as it seems > > > > to be shared between the heartbeat and map update > > > > threads. > > > > > > > > Would instrumenting acquiring/releasing that lock > > > > be helpful? Is there some other lock that may > > > > be more fruitful to instrument? I can reproduce > > > > pretty reliably, so adding instrumentation is > > > > no problem. > > > > > > The heartbeat thread is doing a map_lock.try_get_read() because it > > > frequently is held by another thread, so that shouldn't ever block. > > > > > > The possibilities I see are: > > > - peer_stat_lock > > > - the monc->sub_want / renew_subs calls (monc has an internal lock), > > > although that code should only trigger with a single osd. :/ > > > - heartbeat_lock itself could be held by another thread; i'd instrument > > > all locks/unlocks there, along with the wakeup in heartbeat(). > > > > I think the culprit is osd_lock. > > > > The OSD tick timer needs to acquire the osd_lock mutex to > > wake up, right? So if osd_lock is ever held for an excessive > > time, the tick will be delayed. > > > > I used debug lockdep = 20 on my osds, and saw this via > > egrep "10:07:(2[2-9]|3[0-3])" osd.70.log | egrep "tick|osd_lock" | grep -v "tick .*" > > > > 2011-02-23 10:07:23.454954 0x7f51a17fa940 lockdep: _will_lock OSD::osd_lock (9) > > 2011-02-23 10:07:23.455118 0x7f51a37fe940 lockdep: _will_unlock OSD::osd_lock > > 2011-02-23 10:07:23.455254 0x7f51a37fe940 lockdep: _will_lock OSD::osd_lock (9) > > 2011-02-23 10:07:23.455262 0x7f51a37fe940 lockdep: _locked OSD::osd_lock > > 2011-02-23 10:07:33.008092 0x7f51a37fe940 lockdep: _will_unlock OSD::osd_lock > > 2011-02-23 10:07:33.009135 0x7f51a17fa940 lockdep: _locked OSD::osd_lock > > 2011-02-23 10:07:33.009217 0x7f51a17fa940 lockdep: _will_unlock OSD::osd_lock > > 2011-02-23 10:07:33.009307 0x7f51ad89d940 osd70 149 tick > > > > I've attached a full log of the 10 second period that osd_lock > > was held. > > > > Is it possible to separate out the parts of tick() that > > need to covered by osd_lock and do them somewhere else, > > so that a new lock, say tick_lock, can be used to > > protect the tick timer? > > > > -- Jim > > > > tick() isn't involved in heartbeats, actually, so any blocks there should be unrelated to heartbeat bugs (and once you're in tick() or functions it calls, there's no losing the heartbeat!). I guess I was confused by tick() calling heartbeat_check(). In all the logs I collect I find the trouble spots by looking for when tick() delayed. > I have managed to get OSDs wrongly marking each other down during startup when they're peering large numbers of PGs/pools, as they disagree on who they need to be heartbeating (due to the slow handling of new osd maps and pg creates); if you're mostly seeing OSDs get incorrectly marked down during low epochs (your original email said epoch 7) this is probably what you're finding. > What I've been trying to look for is heartbeat stalls after I start up a bunch of clients writing. I'm really not sure why that original log caught one at such an early epoch - maybe there's two things going on? > We still have no idea what could be causing the stall *inside* of tick(), though. :/ I think that one was just lucky. Most of the stalls I've collected are between ticks. -- Jim > -Greg > > > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html