On 15 June 2015 at 13:09, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
I'm not sure what version of Linux this really is (I assume it's aOn Mon, Jun 15, 2015 at 4:03 AM, Roland Giesler <roland@xxxxxxxxxxxxxx> wrote:
> I have a small cluster of 4 machines and quite a few drives. After about 2
> - 3 weeks cephfs fails. It's not properly mounted anymore in /mnt/cephfs,
> which of course causes the VM's running to fail too.
>
> In /var/log/syslog I have "/mnt/cephfs: File exists at
> /usr/share/perl5/PVE/Storage/DirPlugin.pm line 52" repeatedly.
>
> There doesn't seem to be anything wrong with ceph at the time.
>
> # ceph -s
> cluster 40f26838-4760-4b10-a65c-b9c1cd671f2f
> health HEALTH_WARN clock skew detected on mon.s1
> monmap e2: 2 mons at
> {h1=192.168.121.30:6789/0,s1=192.168.121.33:6789/0}, election epoch 312,
> quorum 0,1 h1,s1
> mdsmap e401: 1/1/1 up {0=s3=up:active}, 1 up:standby
> osdmap e5577: 19 osds: 19 up, 19 in
> pgmap v11191838: 384 pgs, 3 pools, 774 GB data, 455 kobjects
> 1636 GB used, 9713 GB / 11358 GB avail
> 384 active+clean
> client io 12240 kB/s rd, 1524 B/s wr, 24 op/s
> # ceph osd tree
> # id weight type name up/down reweight
> -1 11.13 root default
> -2 8.14 host h1
> 1 0.9 osd.1 up 1
> 3 0.9 osd.3 up 1
> 4 0.9 osd.4 up 1
> 5 0.68 osd.5 up 1
> 6 0.68 osd.6 up 1
> 7 0.68 osd.7 up 1
> 8 0.68 osd.8 up 1
> 9 0.68 osd.9 up 1
> 10 0.68 osd.10 up 1
> 11 0.68 osd.11 up 1
> 12 0.68 osd.12 up 1
> -3 0.45 host s3
> 2 0.45 osd.2 up 1
> -4 0.9 host s2
> 13 0.9 osd.13 up 1
> -5 1.64 host s1
> 14 0.29 osd.14 up 1
> 0 0.27 osd.0 up 1
> 15 0.27 osd.15 up 1
> 16 0.27 osd.16 up 1
> 17 0.27 osd.17 up 1
> 18 0.27 osd.18 up 1
>
> When I "umount -l /mnt/cephfs" and then "mount -a" after that, the the ceph
> volume is loaded again. I can restart the VM's and all seems well.
>
> I can't find errors pertaining to cephfs in the the other logs either.
>
> System information:
>
> Linux s1 2.6.32-34-pve #1 SMP Fri Dec 19 07:42:04 CET 2014 x86_64 GNU/Linux
vendor kernel of some kind!), but it's definitely an old one! CephFS
sees pretty continuous improvements to stability and it could be any
number of resolved bugs.
This is the stock standard installation of Proxmox with CephFS.
If you can't upgrade the kernel, you might try out the ceph-fuse
client instead as you can run a much newer and more up-to-date version
of it, even on the old kernel.
I'm under the impression that CephFS is the filesystem implimented by ceph-fuse. Is it not?
Other than that, can you include more
information about exactly what you mean when saying CephFS unmounts
itself?
Everything runs fine for weeks. Then suddenly a user reports that a VM is not functioning anymore. On investigation is transpires than CephFS is not mounted anymore and the error I reported is logged.
I can't see anything else wrong at this stage. ceph is running, the osd are all up.
thanks again
Roland
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com