On 21 December 2015 at 03:23, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
On Sat, Dec 19, 2015 at 4:34 AM, Don Waterloo <don.waterloo@xxxxxxxxx> wrote:
> I have 3 systems w/ a cephfs mounted on them.
> And i am seeing material 'lag'. By 'lag' i mean it hangs for little bits of
> time (1s, sometimes 5s).
> But very non repeatable.
>
> If i run
> time find . -type f -print0 | xargs -0 stat > /dev/null
> it might take ~130ms.
> But, it might take 10s. Once i've done it, it tends to stay @ the ~130ms,
> suggesting whatever data is now in cache. On the cases it hangs, if i remove
> the stat, its hanging on the find of one file. It might hiccup 1 or 2 times
> in the find across 10k files.
>
When operation hangs, do you see any 'slow request ...' log message in
the cluster log. Besides, do have have multiple clients accessing the
filesystem? which version of ceph do you use?
Regards
Yan, Zheng
There are some 'slow...' log:
ceph.log.1.gz:2015-12-20 21:48:51.047945 osd.5 10.100.10.124:6801/46249 561 : cluster [WRN] slow request 30.492476 seconds old, received at 2015-12-20 21:48:20.555383: osd_op(client.1294098.1:315704 10000056ffe.00000000 [write 0~12475] 13.bf7fb0aa snapc 1=[] ondisk+write e2459) currently waiting for subops from 1
Its ceph 0.94.5-0ubuntu0.15.10.1 on Ubuntu 15.10 w/ kernel 4.3.0-040300-generic
What does the 'slow request' mean?
The file system is mounted on 3 hosts. The others might be doing some minor access I suppose, but nothing systemic.
I've had smokeping running between all the osd machines and have 0 loss, ~0 latency at all times. E.g. its 200us average, +- 75us.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com