Re: cephfs 'lag' / hang

Don Waterloo <don.waterloo@xxxxxxxxx> · Mon, 21 Dec 2015 10:01:11 -0500

On 21 December 2015 at 03:23, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
On Sat, Dec 19, 2015 at 4:34 AM, Don Waterloo <don.waterloo@xxxxxxxxx> wrote:

> I have 3 systems w/ a cephfs mounted on them.

> And i am seeing material 'lag'. By 'lag' i mean it hangs for little bits of

> time (1s, sometimes 5s).

> But very non repeatable.

>

> If i run

> time find . -type f -print0 | xargs -0 stat > /dev/null

> it might take ~130ms.

> But, it might take 10s. Once i've done it, it tends to stay @ the ~130ms,

> suggesting whatever data is now in cache. On the cases it hangs, if i remove

> the stat, its hanging on the find of one file. It might hiccup 1 or 2 times

> in the find across 10k files.

>

When operation hangs, do you see any 'slow request ...' log message in

the cluster log. Besides, do have have multiple clients accessing the

filesystem? which version of ceph do you use?

Regards

Yan, Zheng

There are some 'slow...' log:

ceph.log.1.gz:2015-12-20 21:48:51.047945 osd.5 10.100.10.124:6801/46249 561 : cluster [WRN] slow request 30.492476 seconds old, received at 2015-12-20 21:48:20.555383: osd_op(client.1294098.1:315704 10000056ffe.00000000 [write 0~12475] 13.bf7fb0aa snapc 1=[] ondisk+write e2459) currently waiting for subops from 1

 Its ceph 0.94.5-0ubuntu0.15.10.1 on Ubuntu 15.10 w/ kernel 4.3.0-040300-generic

What does the 'slow request' mean?

The file system is mounted on 3 hosts. The others might be doing some minor access I suppose, but nothing systemic.

I've had smokeping running between all the osd machines and have 0 loss, ~0 latency at all times. E.g. its 200us average, +- 75us.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com