cephfs unmounts itself from time to time

Roland Giesler <roland@xxxxxxxxxxxxxx> · Mon, 15 Jun 2015 13:03:30 +0200

I have a small cluster of 4 machines and quite a few drives.  After about 2 - 3 weeks cephfs fails.  It's not properly mounted anymore in /mnt/cephfs, which of course causes the VM's running to fail too.

In /var/log/syslog I have "/mnt/cephfs: File exists at /usr/share/perl5/PVE/Storage/DirPlugin.pm line 52" repeatedly.

There doesn't seem to be anything wrong with ceph at the time.

# ceph -s
    cluster 40f26838-4760-4b10-a65c-b9c1cd671f2f
     health HEALTH_WARN clock skew detected on mon.s1
     monmap e2: 2 mons at {h1=192.168.121.30:6789/0,s1=192.168.121.33:6789/0}, election epoch 312, quorum 0,1 h1,s1
     mdsmap e401: 1/1/1 up {0=s3=up:active}, 1 up:standby
     osdmap e5577: 19 osds: 19 up, 19 in
      pgmap v11191838: 384 pgs, 3 pools, 774 GB data, 455 kobjects
            1636 GB used, 9713 GB / 11358 GB avail
                 384 active+clean
  client io 12240 kB/s rd, 1524 B/s wr, 24 op/s
# ceph osd tree
# id  weight   type name    up/down  reweight
-1    11.13    root default
-2     8.14        host h1
 1     0.9             osd.1    up    1    
 3     0.9             osd.3    up    1    
 4     0.9             osd.4    up    1    
 5     0.68            osd.5    up    1    
 6     0.68            osd.6    up    1    
 7     0.68            osd.7    up    1    
 8     0.68            osd.8    up    1    
 9     0.68            osd.9    up    1    
10     0.68            osd.10   up    1    
11     0.68            osd.11   up    1    
12     0.68            osd.12   up    1    
-3     0.45        host s3
 2     0.45            osd.2    up    1    
-4     0.9         host s2
13     0.9             osd.13   up    1    
-5     1.64        host s1
14     0.29            osd.14   up    1    
 0     0.27            osd.0    up    1    
15     0.27            osd.15   up    1    
16     0.27            osd.16   up    1    
17     0.27            osd.17   up    1    
18     0.27            osd.18   up    1    

When I "umount -l /mnt/cephfs" and then "mount -a" after that, the the ceph volume is loaded again.  I can restart the VM's and all seems well.

I can't find errors pertaining to cephfs in the the other logs either.

System information:

Linux s1 2.6.32-34-pve #1 SMP Fri Dec 19 07:42:04 CET 2014 x86_64 GNU/Linux

I can't upgrade to kernel v3.13 since I'm using containers.

Of course, I want to prevent this from happening!  How do I troubleshoot that?  What is causing this?

regards

Roland Giesler

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com