Re: Slow request in XFS

"Jim Schutt" <jaschut@xxxxxxxxxx> · Thu, 31 Jan 2013 13:41:20 -0700

On 01/31/2013 01:00 PM, Sage Weil wrote:
> On Thu, 31 Jan 2013, Jim Schutt wrote:
>> On 01/31/2013 05:43 AM, Sage Weil wrote:
>>> Hi-
>>>
>>> Can you reproduce this with logs?  It looks like there are a few ops that 
>>> are hanging for a very long time, but there isn't enough information here 
>>> except to point to osds 610, 612, 615, and 68...
>>
>> FWIW, I have a small pile of disks with bad sections in
>> them - these sections aren't bad enough that writes fail,
>> but they are bad enough that throughput drops by a factor
>> of ~20.
>>
>> Do OSDs already collect statistics on, say, op commit
>> elapsed time (assuming that's the statistic most directly
>> related to slow writes on only some sections of a disk),
>> in a way that could be used to diagnose such disks?
>>
>> If not, is there enough structure already in place that
>> it would be easy to add?
> 
> They're tracking it internally, but it gets averaged into the totals 
> before a user gets to see any per-request latencies.  The per-daemon 
> totals are available via the admin socket 'perf dump' command.  Have you 
> looked at that information yet?

Not yet - thanks for the pointer.

I'll take a look, and see what happens when I 
put one of my questionable disks back in.

Thanks!

-- Jim

> 
> sage
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html