Re: ls/file access hangs on a single ceph directory

Sage Weil <sage@xxxxxxxxxxx> · Wed, 23 Oct 2013 15:54:06 -0700 (PDT)

If you do

 ceph mds tell 0 dumpcache /tmp/foo

it will dump the dms cache, and 

 ceph-post-file /tmp/foo

will send the file to ceph.com so we can get some clue what happened.  I 
suspect that restarting the ceph-mds process will resolve the hang.

Thanks!
sage

On Wed, 23 Oct 2013, Michael wrote:

> Tying to gather some more info.
> 
> CentOS - hanging ls
> [root@srv ~]# cat /proc/14614/stack
> [<ffffffffa02d3e81>] wait_answer_interruptible+0x81/0xc0 [fuse]
> [<ffffffffa02d415b>] fuse_request_send+0x1cb/0x290 [fuse]
> [<ffffffffa02d652c>] fuse_do_getattr+0x10c/0x2c0 [fuse]
> [<ffffffffa02d6755>] fuse_update_attributes+0x75/0x80 [fuse]
> [<ffffffffa02d67b3>] fuse_getattr+0x53/0x60 [fuse]
> [<ffffffff81186d51>] vfs_getattr+0x51/0x80
> [<ffffffff81186de0>] vfs_fstatat+0x60/0x80
> [<ffffffff81186f2b>] vfs_stat+0x1b/0x20
> [<ffffffff81186f54>] sys_newstat+0x24/0x50
> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> Ubuntu - hanging ls
> root@srv:~# cat /proc/30012/stack
> [<ffffffffa061d04b>] ceph_mdsc_do_request+0xcb/0x1a0 [ceph]
> [<ffffffffa0608f37>] ceph_do_getattr+0xe7/0x120 [ceph]
> [<ffffffffa0608f94>] ceph_getattr+0x24/0x100 [ceph]
> [<ffffffff8118d42e>] vfs_getattr+0x4e/0x80
> [<ffffffff8118d4ae>] vfs_fstatat+0x4e/0x70
> [<ffffffff8118d4ee>] vfs_lstat+0x1e/0x20
> [<ffffffff8118d68a>] sys_newlstat+0x1a/0x40
> [<ffffffff816a6ba9>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> Started occurring shortly (within an hour or so) after adding a pool, not sure
> if that's relevant yet.
> 
> -Michael
> 
> On 23/10/2013 21:10, Michael wrote:
> > I have a filesystem shared by several systems mounted on 2 ceph nodes with a
> > 3rd as a reference monitor.
> > It's been used for a couple of months now but suddenly the root directory
> > for the mount has become inaccessible and requests to files in it just hang,
> > there's no ceph errors reported before/after and subdirectories of the
> > directory can be used (and still are currently being used by VM's still
> > running from it). It's being mounted in a mixed kernel driver (ubuntu) and
> > centos (ceph-fuse) environment.
> > 
> >  cluster ab3f7bc0-4cf7-4489-9cde-1af11d68a834
> >    health HEALTH_OK
> >    monmap e1: 3 mons at {srv10=##:6789/0,srv11=##:6789/0,srv8=##:6789/0},
> > election epoch 96, quorum 0,1,2 srv10,srv11,srv8
> >    osdmap e2873: 6 osds: 6 up, 6 in
> >    pgmap v2451618: 728 pgs: 728 active+clean; 128 GB data, 260 GB used, 3929
> > GB / 4189 GB avail; 30365B/s wr, 5op/s
> >    mdsmap e51: 1/1/1 up {0=srv10=up:active}
> > 
> > Have done a full deep scrub/repair cycle on all of the osd which has come
> > back fine so not really sure where to start looking to find out what's wrong
> > with it.
> > 
> > Any ideas?
> > 
> > -Michael
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com