On Thu, Dec 17, 2015 at 11:43 AM, Bryan Wright <bkw1a@xxxxxxxxxxxx> wrote: > Hi folks, > > This is driving me crazy. I have a ceph filesystem that behaves normally > when I "ls" files, and behaves normally when I copy smallish files on or off > of the filesystem, but large files (~ GB size) hang after copying a few > megabytes. > > This is ceph 0.94.5 under Centos 6.7 under kernel 4.3.3-1.el6.elrepo.x86_64. > I've tried 64-bit and 32-bit clients with several different kernels, but > all behave the same. > > After copying the first few bytes I get a stream of "slow request" messages > for the osds, like this: > > 2015-12-17 14:20:40.458306 osd.208 [WRN] slow request 1922.166564 seconds > old, received at 2015-12-17 13:48:38.291683: osd_op(mds.0.14956:851 > 100010a7b92.0000000d [stat] 0.5d427a9a RETRY=5 > ack+retry+read+rwordered+known_if_redirected e193868) currently reached_pg > > It's not a single OSD misbehaving. It seems to be any OSD. The OSDs have > plenty of disk space, and there's nothing in the osd logs that points to a > problem. > > How can I find out what's blocking these requests? What's the full output of "ceph -s"? The only time the MDS issues these "stat" ops on objects is during MDS replay, but the bit where it's blocked on "reached_pg" in the OSD makes it look like your OSD is just very slow. (Which could potentially make the MDS back up far enough to get zapped by the monitors, but in that case it's probably some kind of misconfiguration issue if they're all hitting it.) -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com