Re: cephfs unmounts itself from time to time

Gregory Farnum <greg@xxxxxxxxxxx> · Mon, 15 Jun 2015 04:09:16 -0700

On Mon, Jun 15, 2015 at 4:03 AM, Roland Giesler <roland@xxxxxxxxxxxxxx> wrote:
> I have a small cluster of 4 machines and quite a few drives.  After about 2
> - 3 weeks cephfs fails.  It's not properly mounted anymore in /mnt/cephfs,
> which of course causes the VM's running to fail too.
>
> In /var/log/syslog I have "/mnt/cephfs: File exists at
> /usr/share/perl5/PVE/Storage/DirPlugin.pm line 52" repeatedly.
>
> There doesn't seem to be anything wrong with ceph at the time.
>
> # ceph -s
>     cluster 40f26838-4760-4b10-a65c-b9c1cd671f2f
>      health HEALTH_WARN clock skew detected on mon.s1
>      monmap e2: 2 mons at
> {h1=192.168.121.30:6789/0,s1=192.168.121.33:6789/0}, election epoch 312,
> quorum 0,1 h1,s1
>      mdsmap e401: 1/1/1 up {0=s3=up:active}, 1 up:standby
>      osdmap e5577: 19 osds: 19 up, 19 in
>       pgmap v11191838: 384 pgs, 3 pools, 774 GB data, 455 kobjects
>             1636 GB used, 9713 GB / 11358 GB avail
>                  384 active+clean
>   client io 12240 kB/s rd, 1524 B/s wr, 24 op/s
> # ceph osd tree
> # id  weight   type name    up/down  reweight
> -1    11.13    root default
> -2     8.14        host h1
>  1     0.9             osd.1    up    1
>  3     0.9             osd.3    up    1
>  4     0.9             osd.4    up    1
>  5     0.68            osd.5    up    1
>  6     0.68            osd.6    up    1
>  7     0.68            osd.7    up    1
>  8     0.68            osd.8    up    1
>  9     0.68            osd.9    up    1
> 10     0.68            osd.10   up    1
> 11     0.68            osd.11   up    1
> 12     0.68            osd.12   up    1
> -3     0.45        host s3
>  2     0.45            osd.2    up    1
> -4     0.9         host s2
> 13     0.9             osd.13   up    1
> -5     1.64        host s1
> 14     0.29            osd.14   up    1
>  0     0.27            osd.0    up    1
> 15     0.27            osd.15   up    1
> 16     0.27            osd.16   up    1
> 17     0.27            osd.17   up    1
> 18     0.27            osd.18   up    1
>
> When I "umount -l /mnt/cephfs" and then "mount -a" after that, the the ceph
> volume is loaded again.  I can restart the VM's and all seems well.
>
> I can't find errors pertaining to cephfs in the the other logs either.
>
> System information:
>
> Linux s1 2.6.32-34-pve #1 SMP Fri Dec 19 07:42:04 CET 2014 x86_64 GNU/Linux

I'm not sure what version of Linux this really is (I assume it's a
vendor kernel of some kind!), but it's definitely an old one! CephFS
sees pretty continuous improvements to stability and it could be any
number of resolved bugs.

If you can't upgrade the kernel, you might try out the ceph-fuse
client instead as you can run a much newer and more up-to-date version
of it, even on the old kernel. Other than that, can you include more
information about exactly what you mean when saying CephFS unmounts
itself?
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com