On Fri, Apr 8, 2011 at 3:11 PM, Jim Schutt <jaschut@xxxxxxxxxx> wrote: > Sage Weil wrote: > >> >> I would also be interested in seeing a system level profile (oprofile?) to >> see where CPU time is being spent. There are likely low hanging fruit in >> the OSD that would reduce CPU overhead. > > This will take me a little while, since I need to learn > about the tools. But since I need to learn about them > anyway, that's a good thing. oprofile is surprisingly easy to get started with. We have a wiki page about it: http://ceph.newdream.net/wiki/Cpu_profiling > >> >> I guess the other thing that would help to confirm this is to just halve >> the number of OSDs on your machines in a test and see if the problem goes >> away. > > I was going to try this first, exactly because it seems like > a definitive test. > >> >>> If my analysis above is correct, do you think anything >>> can be gained by running the heartbeat and heartbeat >>> dispatcher threads as SCHED_RR threads? Since tick() runs >>> heartbeat_check(), that would also need to be SCHED_RR, >>> or the heartbeats could arrive on time, but not checked >>> until it was too late. Thanks for the ideas. However, I doubt that making the OSD::tick() thread SCHED_RR would really work. The OSD::tick() code is taking locks all over the place. Since a bunch of other threads besides the tick thread can be holding those locks, this would soon result in priority inversion. Not to mention, heartbeat_messenger has its own thread(s) which actually perform the work of sending the heartbeat messages. cheers, Colin -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html