Cephfs truncating files

"Tobias Prousa" <topro@xxxxxx> · Fri, 08 Feb 2013 11:00:38 +0100

Hi,

I'm am evaluating a small ceph cluster with three nodes each running one mon, one mds, and 2-3 osds. Nodes are debian squeeze/wheezy (mixed) all running linux-3.2.35 from debian repos (from backports.debian.org on squeeze box). Ceph is using repo from ceph.com/debian-bobtail, installed 0.56.2.

The client is running debian wheezy but kernel 3.7.3 from debian experimental, ceph packages also installed using ceph.com/debian-bobtail repo.

There is only one active mds with two standby. One thing which might be unusual is that I do not mount the cephfs-root on the client but a subdir, i.e. mount -t ceph 172.16.0.4:6789:/home/ /home

I am experiencing truncated files where files get writte successfully to ceph and can be read without problem, i.e. md5sum is ok and even filesize shows correct value of maybe 3.7MiB. Then after waiting for some time without the file receiving any IO it starts showing a filesize of 2.0MiB and md5sum fails (as to be expected).

So far I only noticed that behaviour only in one subtree of my home folder where I prepare software packages using "tar cvfj ..." to create a .tar.bz2 bundle. Those files suddenly get truncated to 2.0MiB after some time, but it seems that only newly created files get truncated after some time. Similar .tar.bz2 files which I have there, since initial rsync to ceph, don't get truncated at all.

My ceph.conf looks like this:

[global]
    auth cluster required = cephx
    auth service required = cephx
    auth client required = cephx

    public network = 172.16.0.0/16
    cluster network = 192.168.0.0/24

[osd]
    osd journal size = 1000

[mon.a]
    host = bellerophon
    mon addr = 172.16.0.4:6789

    public addr = 172.16.0.4
    cluster addr = 192.168.0.4

[mon.b]
    host = intrepid
    mon addr = 172.16.0.3:6789

    public addr = 172.16.0.3
    cluster addr = 192.168.0.3

[mon.c]
    host = voyager-b
    mon addr = 172.16.0.2:6789

    public addr = 172.16.0.2
    cluster addr = 192.168.0.2

[osd.0]
    host = bellerophon

    public addr = 172.16.0.4
    cluster addr = 192.168.0.4

[osd.1]
    host = bellerophon

    public addr = 172.16.0.4
    cluster addr = 192.168.0.4

[osd.2]
    host = intrepid

    public addr = 172.16.0.3
    cluster addr = 192.168.0.3

[osd.3]
    host = intrepid

    public addr = 172.16.0.3
    cluster addr = 192.168.0.3

[osd.4]
    host = voyager-b

    public addr = 172.16.0.2
    cluster addr = 192.168.0.2

[osd.5]
    host = voyager-b

    public addr = 172.16.0.2
    cluster addr = 192.168.0.2

[osd.6]
    host = voyager-b

    public addr = 172.16.0.2
    cluster addr = 192.168.0.2

[osd.7]
    host = bellerophon

    public addr = 172.16.0.4
    cluster addr = 192.168.0.4

[mds.a]
    host = bellerophon

    public addr = 172.16.0.4
    cluster addr = 192.168.0.4

[mds.b]
    host = intrepid

    public addr = 172.16.0.3
    cluster addr = 192.168.0.3

[mds.c]
    host = voyager-b

    public addr = 172.16.0.2
    cluster addr = 192.168.0.2

The only known issue I could find which might be related is something like that: http://www.mail-archive.com/ceph-devel@xxxxxxxxxxxxxxx/msg09703.html

Is there any thing I can do to help track that down? What kind of additional information to provide? You might also find me on ceph irc channel, nick topro.

Thanks, Tobi
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com