On Fri, 2011-03-11 at 15:45 -0700, Sage Weil wrote: > On Fri, 11 Mar 2011, Jim Schutt wrote: > > > So it occurs to me that one call to Message::put() entails many > > > calls to buffer::ptr::release(), depending on what the message > > > is, right? Maybe time the "delete _raw" in there and assert() > > > if it's too long? > > > > Also, any chance all incoming data is causing buffer_total_alloc > > to be contended? I don't have libatomic_ops either, so that > > atomic_t is implemented via a pthread_spinlock_t, right? > > How to check that? > > Hmm, it could be. I pushed a nobuffer branch that compiles out the > buffer_total_alloc accounting, if you want to give that a go. That seems to have helped, although it's not a complete solution. I still got some OSDs failed, but since I use osd min down reporters = 3 osd min down reports = 2 only 1 OSD got marked down; it noticed quickly and marked itself up, and my 64-client dd finished. That's new for me at 96 OSDs. I saw this on this run: # grep -Hn RefCountedObject::put osd.*.log | egrep "took [0-9][0-9]\." | wc -l 192 # grep -Hn RefCountedObject::put osd.*.log | egrep "took [1-9]\." | wc -l 12578 which compares to a previous run in an earlier email: > > > # grep -Hn RefCountedObject::put osd.*.log | egrep "took [1-9]\." | wc -l > > > 8911 > > > > > > # grep -Hn RefCountedObject::put osd.*.log | egrep "took [0-9][0-9]\." | wc -l > > > 415 -- Jim > > sage > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html