On Wed, 16 Feb 2011, Jim Schutt wrote: > On Wed, 2011-02-16 at 14:40 -0700, Gregory Farnum wrote: > > On Wednesday, February 16, 2011 at 1:25 PM, Jim Schutt wrote: > > > Hi, > > > > > > I've been testing v0.24.3 w/ 64 clients against > > > 1 mon, 1 mds, 96 osds. Under heavy write load I > > > see: > > > [WRN] map e7 wrongly marked me down or wrong addr > > > > > > I was able to sort through the logs and discover that when > > > this happens I have large gaps (10 seconds or more) in osd > > > heatbeat processing. In those heartbeat gaps I've discovered > > > long periods (5-15 seconds) where an osd logs nothing, even > > > though I am running with debug osd/filestore/journal = 20. > > > > > > Is this a known issue? > > > > You're running on btrfs? > > Yep. Are the cosd log files on the same btrfs volume as the btrfs data, or elsewhere? The heartbeat thread takes some pains to avoid any locks that may be contented and do avoid any disk io, so in theory a btrfs stall shouldn't affect anything. We may have missed something.. do you have a log showing this in action? sage > > > We've come across some issues involving very long sync times that I believe manifest like this. Sage is looking into them, although it's delayed at the moment thanks to FAST 11. :) > > OK, great. > > -- Jim > > > -Greg > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html