Re: Slow request warnings on 0.48

Matthew Richardson <m.richardson@xxxxxxxx> · Thu, 19 Jul 2012 11:48:35 +0100

I'd just like to report the same behaviour on my test cluster with 0.48.

I've set up a single box (Sl6.1 - 2.6.32-220.23.1 kernel) with 1 mds,
mon and osd, and replication set to '1' for both data and metadata.

Having mounted using ceph-fuse, I'm running a simple fio job to create load:

[global]
directory=/mnt/ceph
size=500M
rw=read
ioengine=libaio

[simple]

I'm then watching the latency with ioping.

With rw=read, rw=randread (random reads) or rw=write (sequential writes)
I see no problems and the latency sits around 1-2ms.  However, with
rw=randwrite (random writes) I see the latency jump to between 5 and 60
seconds, and the following types of warning lines appear:

2012-07-19 10:29:39.417625 osd.0 [WRN] 11 slow requests, 6 included
below; oldest blocked for > 54.425766 secs
[WRN] slow request 54.420958 seconds old, received at 2012-07-19
10:28:44.996584: osd_op(client.4113.0:9153 100000003ed.0000003b [write
847872~4096] 0.dc4b476f snapc 1=[]) v4 currently started
2012-07-19 10:29:39.417641 osd.0 [WRN] slow request 54.420587 seconds
old, received at 2012-07-19 10:28:44.996955: osd_op(client.4113.0:9154
100000003ed.00000000 [write 1175552~4096] 0.44a7cb80 snapc 1=[]) v4
currently started
[...snip...]

Let me know if there's any more information that I can provide that
might help with diagnosing the problem (also bearing in mind that I'm
new to ceph so might need extra notes on generating tests, dumps etc :) )

Thanks,

Matthew

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Attachment:
signature.asc

Description: OpenPGP digital signature