On 01/31/2013 01:00 PM, Sage Weil wrote: > On Thu, 31 Jan 2013, Jim Schutt wrote: >> On 01/31/2013 05:43 AM, Sage Weil wrote: >>> Hi- >>> >>> Can you reproduce this with logs? It looks like there are a few ops that >>> are hanging for a very long time, but there isn't enough information here >>> except to point to osds 610, 612, 615, and 68... >> >> FWIW, I have a small pile of disks with bad sections in >> them - these sections aren't bad enough that writes fail, >> but they are bad enough that throughput drops by a factor >> of ~20. >> >> Do OSDs already collect statistics on, say, op commit >> elapsed time (assuming that's the statistic most directly >> related to slow writes on only some sections of a disk), >> in a way that could be used to diagnose such disks? >> >> If not, is there enough structure already in place that >> it would be easy to add? > > They're tracking it internally, but it gets averaged into the totals > before a user gets to see any per-request latencies. The per-daemon > totals are available via the admin socket 'perf dump' command. Have you > looked at that information yet? Not yet - thanks for the pointer. I'll take a look, and see what happens when I put one of my questionable disks back in. Thanks! -- Jim > > sage > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html