Re: Cephfs: large files hang

Gregory Farnum <gfarnum@xxxxxxxxxx> · Thu, 17 Dec 2015 15:50:15 -0800

On Thu, Dec 17, 2015 at 11:43 AM, Bryan Wright <bkw1a@xxxxxxxxxxxx> wrote:
> Hi folks,
>
> This is driving me crazy.  I have a ceph filesystem that behaves normally
> when I "ls" files, and behaves normally when I copy smallish files on or off
> of the filesystem, but large files (~ GB size) hang after copying a few
> megabytes.
>
> This is ceph 0.94.5 under Centos 6.7 under kernel 4.3.3-1.el6.elrepo.x86_64.
>  I've tried 64-bit and 32-bit clients with several different kernels, but
> all behave the same.
>
> After copying the first few bytes I get a stream of "slow request" messages
> for the osds, like this:
>
> 2015-12-17 14:20:40.458306 osd.208 [WRN] slow request 1922.166564 seconds
> old, received at 2015-12-17 13:48:38.291683: osd_op(mds.0.14956:851
> 100010a7b92.0000000d [stat] 0.5d427a9a RETRY=5
> ack+retry+read+rwordered+known_if_redirected e193868) currently reached_pg
>
> It's not a single OSD misbehaving.  It seems to be any OSD.   The OSDs have
> plenty of disk space, and there's nothing in the osd logs that points to a
> problem.
>
> How can I find out what's blocking these requests?

What's the full output of "ceph -s"?

The only time the MDS issues these "stat" ops on objects is during MDS
replay, but the bit where it's blocked on "reached_pg" in the OSD
makes it look like your OSD is just very slow. (Which could
potentially make the MDS back up far enough to get zapped by the
monitors, but in that case it's probably some kind of misconfiguration
issue if they're all hitting it.)
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com