Re: Slow request in XFS

Sage Weil <sage@xxxxxxxxxxx> · Thu, 31 Jan 2013 12:00:48 -0800 (PST)

On Thu, 31 Jan 2013, Jim Schutt wrote:
> On 01/31/2013 05:43 AM, Sage Weil wrote:
> > Hi-
> > 
> > Can you reproduce this with logs?  It looks like there are a few ops that 
> > are hanging for a very long time, but there isn't enough information here 
> > except to point to osds 610, 612, 615, and 68...
> 
> FWIW, I have a small pile of disks with bad sections in
> them - these sections aren't bad enough that writes fail,
> but they are bad enough that throughput drops by a factor
> of ~20.
> 
> Do OSDs already collect statistics on, say, op commit
> elapsed time (assuming that's the statistic most directly
> related to slow writes on only some sections of a disk),
> in a way that could be used to diagnose such disks?
> 
> If not, is there enough structure already in place that
> it would be easy to add?

They're tracking it internally, but it gets averaged into the totals 
before a user gets to see any per-request latencies.  The per-daemon 
totals are available via the admin socket 'perf dump' command.  Have you 
looked at that information yet?

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html