Re: cephfs unmounts itself from time to time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 18, 2015 at 10:15 PM, Roland Giesler <roland@xxxxxxxxxxxxxx> wrote:
> On 15 June 2015 at 13:09, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>
>> On Mon, Jun 15, 2015 at 4:03 AM, Roland Giesler <roland@xxxxxxxxxxxxxx>
>> wrote:
>> > I have a small cluster of 4 machines and quite a few drives.  After
>> > about 2
>> > - 3 weeks cephfs fails.  It's not properly mounted anymore in
>> > /mnt/cephfs,
>> > which of course causes the VM's running to fail too.
>> >
>> > In /var/log/syslog I have "/mnt/cephfs: File exists at
>> > /usr/share/perl5/PVE/Storage/DirPlugin.pm line 52" repeatedly.
>> >
>> > There doesn't seem to be anything wrong with ceph at the time.
>> >
>> > # ceph -s
>> >     cluster 40f26838-4760-4b10-a65c-b9c1cd671f2f
>> >      health HEALTH_WARN clock skew detected on mon.s1
>> >      monmap e2: 2 mons at
>> > {h1=192.168.121.30:6789/0,s1=192.168.121.33:6789/0}, election epoch 312,
>> > quorum 0,1 h1,s1
>> >      mdsmap e401: 1/1/1 up {0=s3=up:active}, 1 up:standby
>> >      osdmap e5577: 19 osds: 19 up, 19 in
>> >       pgmap v11191838: 384 pgs, 3 pools, 774 GB data, 455 kobjects
>> >             1636 GB used, 9713 GB / 11358 GB avail
>> >                  384 active+clean
>> >   client io 12240 kB/s rd, 1524 B/s wr, 24 op/s
>> > # ceph osd tree
>> > # id  weight   type name    up/down  reweight
>> > -1    11.13    root default
>> > -2     8.14        host h1
>> >  1     0.9             osd.1    up    1
>> >  3     0.9             osd.3    up    1
>> >  4     0.9             osd.4    up    1
>> >  5     0.68            osd.5    up    1
>> >  6     0.68            osd.6    up    1
>> >  7     0.68            osd.7    up    1
>> >  8     0.68            osd.8    up    1
>> >  9     0.68            osd.9    up    1
>> > 10     0.68            osd.10   up    1
>> > 11     0.68            osd.11   up    1
>> > 12     0.68            osd.12   up    1
>> > -3     0.45        host s3
>> >  2     0.45            osd.2    up    1
>> > -4     0.9         host s2
>> > 13     0.9             osd.13   up    1
>> > -5     1.64        host s1
>> > 14     0.29            osd.14   up    1
>> >  0     0.27            osd.0    up    1
>> > 15     0.27            osd.15   up    1
>> > 16     0.27            osd.16   up    1
>> > 17     0.27            osd.17   up    1
>> > 18     0.27            osd.18   up    1
>> >
>> > When I "umount -l /mnt/cephfs" and then "mount -a" after that, the the
>> > ceph
>> > volume is loaded again.  I can restart the VM's and all seems well.
>> >
>> > I can't find errors pertaining to cephfs in the the other logs either.
>> >
>> > System information:
>> >
>> > Linux s1 2.6.32-34-pve #1 SMP Fri Dec 19 07:42:04 CET 2014 x86_64
>> > GNU/Linux
>>
>> I'm not sure what version of Linux this really is (I assume it's a
>> vendor kernel of some kind!), but it's definitely an old one! CephFS
>> sees pretty continuous improvements to stability and it could be any
>> number of resolved bugs.
>
>
> This is the stock standard installation of Proxmox with CephFS.
>
>
>>
>> If you can't upgrade the kernel, you might try out the ceph-fuse
>> client instead as you can run a much newer and more up-to-date version
>> of it, even on the old kernel.
>
>
> I'm under the impression that CephFS is the filesystem implimented by
> ceph-fuse. Is it not?

Of course it is, but it's a different implementation than the kernel
client and often has different bugs. ;) Plus you can get a newer
version of it easily.

>> Other than that, can you include more
>> information about exactly what you mean when saying CephFS unmounts
>> itself?
>
>
> Everything runs fine for weeks.  Then suddenly a user reports that a VM is
> not functioning anymore.  On investigation is transpires than CephFS is not
> mounted anymore and the error I reported is logged.
>
> I can't see anything else wrong at this stage.  ceph is running, the osd are
> all up.

Maybe one of our kernel devs has a better idea but I've no clue how to
debug this if you can't give me any information about how CephFS came
to be unmounted. It just doesn't make any sense to me. :(
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux