Re: CephFS and slow requests

Gregory Farnum <greg@xxxxxxxxxxx> · Tue, 25 Feb 2014 07:15:11 -0800



Okay, well, let's try and track some of these down. What's the content
of the "ceph.layout" xattr on the directory you're running this test
in? Can you verify that pool 0 is the data pool used by CephFS, and
that all reported slow ops are in that pool? Can you record the IO
patterns on an OSD while this test is being run and see what it does?
(I'm wondering if none of the CephFS pools are in the page cache due
to lack of use, and it's seeking all over trying to find them once the
test starts.)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Mon, Feb 24, 2014 at 11:54 PM, Dan van der Ster
<daniel.vanderster@xxxxxxx> wrote:
> It's really bizarre, since we can easily pump ~1GB/s into the cluster with
> rados bench from a single 10Gig-E client. We only observe this with kernel
> CephFS on that host -- which is why our original theory something like this:
>    - client caches 4GB of writes
>    - client starts many opening IOs in parallel to flush that cache
>    - each individual 4MB write is taking longer than 30s to send from the
> client to the OSD, due to the 1Gig-E network interface on the client.
>
> But in that we assume quite a lot about the implementations of librados and
> the osd. But something like this would also explain why only the cephfs
> writes are becoming slow -- the 2kHz of other (mostly RBD) IOs are not
> affected by this "overload".
>
> Cheers, Dan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com