Re: cephfs unmounts itself from time to time

Roland Giesler <roland@xxxxxxxxxxxxxx> · Thu, 18 Jun 2015 23:15:04 +0200

On 15 June 2015 at 13:09, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
On Mon, Jun 15, 2015 at 4:03 AM, Roland Giesler <roland@xxxxxxxxxxxxxx> wrote:

> I have a small cluster of 4 machines and quite a few drives.  After about 2

> - 3 weeks cephfs fails.  It's not properly mounted anymore in /mnt/cephfs,

> which of course causes the VM's running to fail too.

>

> In /var/log/syslog I have "/mnt/cephfs: File exists at

> /usr/share/perl5/PVE/Storage/DirPlugin.pm line 52" repeatedly.

>

> There doesn't seem to be anything wrong with ceph at the time.

>

> # ceph -s

>     cluster 40f26838-4760-4b10-a65c-b9c1cd671f2f

>      health HEALTH_WARN clock skew detected on mon.s1

>      monmap e2: 2 mons at

> {h1=192.168.121.30:6789/0,s1=192.168.121.33:6789/0}, election epoch 312,

> quorum 0,1 h1,s1

>      mdsmap e401: 1/1/1 up {0=s3=up:active}, 1 up:standby

>      osdmap e5577: 19 osds: 19 up, 19 in

>       pgmap v11191838: 384 pgs, 3 pools, 774 GB data, 455 kobjects

>             1636 GB used, 9713 GB / 11358 GB avail

>                  384 active+clean

>   client io 12240 kB/s rd, 1524 B/s wr, 24 op/s

> # ceph osd tree

> # id  weight   type name    up/down  reweight

> -1    11.13    root default

> -2     8.14        host h1

>  1     0.9             osd.1    up    1

>  3     0.9             osd.3    up    1

>  4     0.9             osd.4    up    1

>  5     0.68            osd.5    up    1

>  6     0.68            osd.6    up    1

>  7     0.68            osd.7    up    1

>  8     0.68            osd.8    up    1

>  9     0.68            osd.9    up    1

> 10     0.68            osd.10   up    1

> 11     0.68            osd.11   up    1

> 12     0.68            osd.12   up    1

> -3     0.45        host s3

>  2     0.45            osd.2    up    1

> -4     0.9         host s2

> 13     0.9             osd.13   up    1

> -5     1.64        host s1

> 14     0.29            osd.14   up    1

>  0     0.27            osd.0    up    1

> 15     0.27            osd.15   up    1

> 16     0.27            osd.16   up    1

> 17     0.27            osd.17   up    1

> 18     0.27            osd.18   up    1

>

> When I "umount -l /mnt/cephfs" and then "mount -a" after that, the the ceph

> volume is loaded again.  I can restart the VM's and all seems well.

>

> I can't find errors pertaining to cephfs in the the other logs either.

>

> System information:

>

> Linux s1 2.6.32-34-pve #1 SMP Fri Dec 19 07:42:04 CET 2014 x86_64 GNU/Linux

I'm not sure what version of Linux this really is (I assume it's a

vendor kernel of some kind!), but it's definitely an old one! CephFS

sees pretty continuous improvements to stability and it could be any

number of resolved bugs.

This is the stock standard installation of Proxmox with CephFS.

If you can't upgrade the kernel, you might try out the ceph-fuse

client instead as you can run a much newer and more up-to-date version

of it, even on the old kernel. 

I'm under the impression that CephFS is the filesystem implimented by ceph-fuse. Is it not? 

Other than that, can you include more

information about exactly what you mean when saying CephFS unmounts

itself?

Everything runs fine for weeks.  Then suddenly a user reports that a VM is not functioning anymore.  On investigation is transpires than CephFS is not mounted anymore and the error I reported is logged.

I can't see anything else wrong at this stage.  ceph is running, the osd are all up.

thanks again

Roland

-Greg

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com