I have a small cluster of 4 machines and quite a few drives. After about 2 - 3 weeks cephfs fails. It's not properly mounted anymore in /mnt/cephfs, which of course causes the VM's running to fail too.
In /var/log/syslog I have "/mnt/cephfs: File exists at /usr/share/perl5/PVE/Storage/DirPlugin.pm line 52" repeatedly.
In /var/log/syslog I have "/mnt/cephfs: File exists at /usr/share/perl5/PVE/Storage/DirPlugin.pm line 52" repeatedly.
There doesn't seem to be anything wrong with ceph at the time.
# ceph -s
cluster 40f26838-4760-4b10-a65c-b9c1cd671f2f
health HEALTH_WARN clock skew detected on mon.s1
monmap e2: 2 mons at {h1=192.168.121.30:6789/0,s1=192.168.121.33:6789/0}, election epoch 312, quorum 0,1 h1,s1
mdsmap e401: 1/1/1 up {0=s3=up:active}, 1 up:standby
osdmap e5577: 19 osds: 19 up, 19 in
pgmap v11191838: 384 pgs, 3 pools, 774 GB data, 455 kobjects
1636 GB used, 9713 GB / 11358 GB avail
384 active+clean
client io 12240 kB/s rd, 1524 B/s wr, 24 op/s
# ceph -s
cluster 40f26838-4760-4b10-a65c-b9c1cd671f2f
health HEALTH_WARN clock skew detected on mon.s1
monmap e2: 2 mons at {h1=192.168.121.30:6789/0,s1=192.168.121.33:6789/0}, election epoch 312, quorum 0,1 h1,s1
mdsmap e401: 1/1/1 up {0=s3=up:active}, 1 up:standby
osdmap e5577: 19 osds: 19 up, 19 in
pgmap v11191838: 384 pgs, 3 pools, 774 GB data, 455 kobjects
1636 GB used, 9713 GB / 11358 GB avail
384 active+clean
client io 12240 kB/s rd, 1524 B/s wr, 24 op/s
# ceph osd tree
# id weight type name up/down reweight
-1 11.13 root default
-2 8.14 host h1
1 0.9 osd.1 up 1
3 0.9 osd.3 up 1
4 0.9 osd.4 up 1
5 0.68 osd.5 up 1
6 0.68 osd.6 up 1
7 0.68 osd.7 up 1
8 0.68 osd.8 up 1
9 0.68 osd.9 up 1
10 0.68 osd.10 up 1
11 0.68 osd.11 up 1
12 0.68 osd.12 up 1
-3 0.45 host s3
2 0.45 osd.2 up 1
-4 0.9 host s2
13 0.9 osd.13 up 1
-5 1.64 host s1
14 0.29 osd.14 up 1
0 0.27 osd.0 up 1
15 0.27 osd.15 up 1
16 0.27 osd.16 up 1
17 0.27 osd.17 up 1
18 0.27 osd.18 up 1
# id weight type name up/down reweight
-1 11.13 root default
-2 8.14 host h1
1 0.9 osd.1 up 1
3 0.9 osd.3 up 1
4 0.9 osd.4 up 1
5 0.68 osd.5 up 1
6 0.68 osd.6 up 1
7 0.68 osd.7 up 1
8 0.68 osd.8 up 1
9 0.68 osd.9 up 1
10 0.68 osd.10 up 1
11 0.68 osd.11 up 1
12 0.68 osd.12 up 1
-3 0.45 host s3
2 0.45 osd.2 up 1
-4 0.9 host s2
13 0.9 osd.13 up 1
-5 1.64 host s1
14 0.29 osd.14 up 1
0 0.27 osd.0 up 1
15 0.27 osd.15 up 1
16 0.27 osd.16 up 1
17 0.27 osd.17 up 1
18 0.27 osd.18 up 1
When I "umount -l /mnt/cephfs" and then "mount -a" after that, the the ceph volume is loaded again. I can restart the VM's and all seems well.
I can't find errors pertaining to cephfs in the the other logs either.
System information:
Linux s1 2.6.32-34-pve #1 SMP Fri Dec 19 07:42:04 CET 2014 x86_64 GNU/Linux
Linux s1 2.6.32-34-pve #1 SMP Fri Dec 19 07:42:04 CET 2014 x86_64 GNU/Linux
I can't upgrade to kernel v3.13 since I'm using containers.
Of course, I want to prevent this from happening! How do I troubleshoot that? What is causing this?
regards
Roland Giesler
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com