Re: Cephfs: large files hang

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 18, 2015 at 7:03 AM, Bryan Wright <bkw1a@xxxxxxxxxxxx> wrote:
> Gregory Farnum <gfarnum@...> writes:
>>
>> What's the full output of "ceph -s"?
>>
>> The only time the MDS issues these "stat" ops on objects is during MDS
>> replay, but the bit where it's blocked on "reached_pg" in the OSD
>> makes it look like your OSD is just very slow. (Which could
>> potentially make the MDS back up far enough to get zapped by the
>> monitors, but in that case it's probably some kind of misconfiguration
>> issue if they're all hitting it.)
>> -Greg
>>
>
> Thanks for the suggestions.  Here's the current messy output of "ceph -s":
>
>     cluster ab8969a6-8b3e-497a-97da-ff06a5476e12
>      health HEALTH_WARN
>             8 pgs down
>             15 pgs incomplete
>             15 pgs stuck inactive
>             15 pgs stuck unclean
>             238 requests are blocked > 32 sec
>      monmap e1: 3 mons at
> {0=192.168.1.31:6789/0,1=192.168.1.32:6789/0,2=192.168.1.33:6789/0}
>             election epoch 42334, quorum 0,1,2 0,1,2
>      mdsmap e78771: 1/1/1 up {0=1=up:active}, 2 up:standby, 1
> up:oneshot-replay(laggy or crashed)
>      osdmap e194472: 58 osds: 58 up, 58 in
>       pgmap v12811210: 1464 pgs, 3 pools, 25856 GB data, 8873 kobjects
>             52265 GB used, 55591 GB / 105 TB avail
>                 1447 active+clean
>                    8 down+incomplete
>                    7 incomplete
>                    2 active+clean+scrubbing
>
>
> The spurious "oneshot-replay" mds entry was caused by a typo in the mds name
> when I tried earlier to do a "ceph-mds --journal-check".
>
> I'm currently trying to copy a large file off of the ceph filesystem, and
> it's hung after 12582912 kB.  The osd log is telling me things like:
>
> 2015-12-18 09:25:22.698124 7f5c0540a700  0 log_channel(cluster) log [WRN] :
> slow request 3840.705492 seconds old, received at 2015-12-18
> 08:21:21.992542: osd_op(mds.0.14959:1257 100010a7ba7.00000000 [create
> 0~0,setxattr parent (293)] 0.beb25de8 ondisk+write+known_if_redirected
> e194470) currently reached_pg
>
> dmesg, etc., show no errors for the osd disk or anything else, and the load
> on the osd server is nonexistent:
>
>    09:53:01 up 17:54,  1 user,  load average: 0.05, 0.43, 0.42
>
> When logged into the osd server, I can browse around on the osd's filesystem
> with no sluggishness:
>
> ls /var/lib/ceph/osd/ceph-406/current
> 0.10c_head  0.4d_head   1.164_head  1.a0_head   2.190_head  commit_op_seq
> 0.10_head   0.57_head   1.18a_head  1.a3_head   2.46_head   meta
> 0.151_head  0.9a_head   1.18c_head  1.e7_head   2.4b_head   nosnap
> 0.165_head  0.9f_head   1.191_head  1.f_head    2.55_head   omap
> 0.18b_head  0.a1_head   1.47_head   2.10a_head  2.9d_head
> 0.18d_head  0.a4_head   1.4c_head   2.14f_head  2.9f_head
> 0.192_head  0.e8_head   1.56_head   2.163_head  2.a2_head
> 0.1b2_head  1.10b_head  1.99_head   2.189_head  2.e6_head
> 0.48_head   1.150_head  1.9e_head   2.18b_head  2.e_head
>
> ifconfig shows no errors on the osd server (public or cluster network):
>
> eth0      Link encap:Ethernet  HWaddr 00:25:90:67:2A:2C
>           inet addr:192.168.1.23  Bcast:192.168.3.255  Mask:255.255.252.0
>           inet6 addr: fe80::225:90ff:fe67:2a2c/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:13016012 errors:1 dropped:6 overruns:0 frame:1
>           TX packets:12839326 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:1515148248 (1.4 GiB)  TX bytes:1533480424 (1.4 GiB)
>           Interrupt:16 Memory:fa9e0000-faa00000
>
> eth1      Link encap:Ethernet  HWaddr 00:25:90:67:2A:2D
>           inet addr:192.168.12.23  Bcast:192.168.15.255  Mask:255.255.252.0
>           inet6 addr: fe80::225:90ff:fe67:2a2d/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:59263760 errors:0 dropped:18476 overruns:0 frame:0
>           TX packets:129010105 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:60511361818 (56.3 GiB)  TX bytes:173505625103 (161.5 GiB)
>           Interrupt:17 Memory:faae0000-fab00000
>
> Snooping with wireshark, I see traffic between osds on the cluster network
> and traffic between clients, and osds on the public network.
>
> The "incomplete" pgs are associated with a dead osd that's been removed from
> the cluster for a long time (since before the current problem).

Nonetheless, it's probably your down or incomplete PGs causing the
issue. You can check that by seeing if seed 0.5d427a9a (out of that
blocked request you mentioned) belongs to one of the dead ones.
-Greg

>
> I thought this problem might be due to something wrong in the 4.* kernel,
> but I've
> reverted the ceph cluster back to the kernel that it was using the last time
> I'm sure things were working (3.19.3-1.el6.elrepo.x86_64) and the behavior
> is the same.
>
> I'm still looking for something that might tell me what's causing the osd
> requests to hang.
>
> Bryan
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux