On Thu, 31 Jan 2013, Jim Schutt wrote: > On 01/31/2013 05:43 AM, Sage Weil wrote: > > Hi- > > > > Can you reproduce this with logs? It looks like there are a few ops that > > are hanging for a very long time, but there isn't enough information here > > except to point to osds 610, 612, 615, and 68... > > FWIW, I have a small pile of disks with bad sections in > them - these sections aren't bad enough that writes fail, > but they are bad enough that throughput drops by a factor > of ~20. > > Do OSDs already collect statistics on, say, op commit > elapsed time (assuming that's the statistic most directly > related to slow writes on only some sections of a disk), > in a way that could be used to diagnose such disks? > > If not, is there enough structure already in place that > it would be easy to add? They're tracking it internally, but it gets averaged into the totals before a user gets to see any per-request latencies. The per-daemon totals are available via the admin socket 'perf dump' command. Have you looked at that information yet? sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html