On Wed, May 13, 2015 at 12:08 PM, Daniel Takatori Ohara <dtohara@xxxxxxxxxxxxx> wrote: > Hi, > > We have a small ceph cluster with 4 OSD's and 1 MDS. > > I run Ubuntu 14.04 with 3.13.0-52-generic in the clients, and CentOS 6.6 > with 2.6.32-504.16.2.el6.x86_64 in Servers. > > The version of Ceph is 0.94.1 > > Sometimes, the CephFS freeze, and the dmesg show me the follow messages : > > May 13 15:53:10 blade02 kernel: [93297.784094] ------------[ cut here > ]------------ > May 13 15:53:10 blade02 kernel: [93297.784121] WARNING: CPU: 10 PID: 299 at > /build/buildd/linux-3.13.0/fs/ceph/inode.c:701 fill_inode.isra.8+0x9ed/0xa00 > [ceph]() > May 13 15:53:10 blade02 kernel: [93297.784129] Modules linked in: 8021q garp > stp mrp llc nfsv3 rpcsec_gss_krb5 nfsv4 ceph libceph libcrc32c intel_rapl > x86_pkg_temp_thermal intel_powerclamp ipmi_devintf gpi > May 13 15:53:10 blade02 kernel: [93297.784204] CPU: 10 PID: 299 Comm: > kworker/10:1 Tainted: G W 3.13.0-52-generic #86-Ubuntu > May 13 15:53:10 blade02 kernel: [93297.784207] Hardware name: Dell Inc. > PowerEdge M520/050YHY, BIOS 2.1.3 01/20/2014 > May 13 15:53:10 blade02 kernel: [93297.784221] Workqueue: ceph-msgr con_work > [libceph] > May 13 15:53:10 blade02 kernel: [93297.784225] 0000000000000009 > ffff880801093a28 ffffffff8172266e 0000000000000000 > May 13 15:53:10 blade02 kernel: [93297.784233] ffff880801093a60 > ffffffff810677fd 00000000ffffffea 0000000000000036 > May 13 15:53:10 blade02 kernel: [93297.784239] 0000000000000000 > 0000000000000000 ffffc9001b73f9d8 ffff880801093a70 > May 13 15:53:10 blade02 kernel: [93297.784246] Call Trace: > May 13 15:53:10 blade02 kernel: [93297.784257] [<ffffffff8172266e>] > dump_stack+0x45/0x56 > May 13 15:53:10 blade02 kernel: [93297.784264] [<ffffffff810677fd>] > warn_slowpath_common+0x7d/0xa0 > May 13 15:53:10 blade02 kernel: [93297.784269] [<ffffffff810678da>] > warn_slowpath_null+0x1a/0x20 > May 13 15:53:10 blade02 kernel: [93297.784280] [<ffffffffa046facd>] > fill_inode.isra.8+0x9ed/0xa00 [ceph] > May 13 15:53:10 blade02 kernel: [93297.784290] [<ffffffffa046e3cd>] ? > ceph_alloc_inode+0x1d/0x4e0 [ceph] > May 13 15:53:10 blade02 kernel: [93297.784302] [<ffffffffa04704cf>] > ceph_readdir_prepopulate+0x27f/0x6d0 [ceph] > May 13 15:53:10 blade02 kernel: [93297.784318] [<ffffffffa048a704>] > handle_reply+0x854/0xc70 [ceph] > May 13 15:53:10 blade02 kernel: [93297.784331] [<ffffffffa048c3f7>] > dispatch+0xe7/0xa90 [ceph] > May 13 15:53:10 blade02 kernel: [93297.784342] [<ffffffffa02a4a78>] ? > ceph_tcp_recvmsg+0x48/0x60 [libceph] > May 13 15:53:10 blade02 kernel: [93297.784354] [<ffffffffa02a7a9b>] > try_read+0x4ab/0x10d0 [libceph] > May 13 15:53:10 blade02 kernel: [93297.784365] [<ffffffffa02a9418>] ? > try_write+0x9a8/0xdb0 [libceph] > May 13 15:53:10 blade02 kernel: [93297.784373] [<ffffffff8101bc23>] ? > native_sched_clock+0x13/0x80 > May 13 15:53:10 blade02 kernel: [93297.784379] [<ffffffff8109d585>] ? > sched_clock_cpu+0xb5/0x100 > May 13 15:53:10 blade02 kernel: [93297.784390] [<ffffffffa02a98d9>] > con_work+0xb9/0x640 [libceph] > May 13 15:53:10 blade02 kernel: [93297.784398] [<ffffffff81083aa2>] > process_one_work+0x182/0x450 > May 13 15:53:10 blade02 kernel: [93297.784403] [<ffffffff81084891>] > worker_thread+0x121/0x410 > May 13 15:53:10 blade02 kernel: [93297.784409] [<ffffffff81084770>] ? > rescuer_thread+0x430/0x430 > May 13 15:53:10 blade02 kernel: [93297.784414] [<ffffffff8108b5d2>] > kthread+0xd2/0xf0 > May 13 15:53:10 blade02 kernel: [93297.784420] [<ffffffff8108b500>] ? > kthread_create_on_node+0x1c0/0x1c0 > May 13 15:53:10 blade02 kernel: [93297.784426] [<ffffffff817330cc>] > ret_from_fork+0x7c/0xb0 > May 13 15:53:10 blade02 kernel: [93297.784431] [<ffffffff8108b500>] ? > kthread_create_on_node+0x1c0/0x1c0 > May 13 15:53:10 blade02 kernel: [93297.784434] ---[ end trace > 05d3f5ee1f31bc67 ]--- > May 13 15:53:10 blade02 kernel: [93297.784437] ceph: fill_inode badness on > ffff8807f7eaa5c0 I don't follow the kernel stuff too closely, but the CephFS kernel client is still improving quite rapidly and 3.13 is old at this point. You could try upgrading to something newer. Zheng might also know what's going on and if it's been fixed. -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com