Re: CephFS metadata corruption on MDS restart

"Yan, Zheng" <ukernel@xxxxxxxxx> · Mon, 9 Sep 2013 19:52:07 +0800

On Mon, Sep 9, 2013 at 3:29 PM, Tobias Prousa <topro@xxxxxx> wrote:
> Hi Ceph,
>
> I recently realized that whenever I'm forced to restart MDS (i.e. stall or
> crash due to execcive memory consumption, btw. my MDS host has 32GB of RAM)
> especially while there are still clients having CephFS mounted, open files
> tend to have their metadata corrupted. Those files, when corrupted, will
> uniquely report a file size of exactly 4MiB, no matter what the real file
> size was. The rest of metadata like name, date, ... seems to be ok. I'm not
> 100% sure this is directly related to MDS restart but it obviously gives me
> the impression. Also the files that get corrupted are those that are highly
> likely open or have been written to most recently. I cannot see anything
> suspect on the logfiles, either.
>
> Some details on my setup:
>
> As servers there are 3 nodes running debian wheezy with ceph dumpling
> (0.67.2-35-g17a7342 from guibuilder, as 0.67.2 didn't get MDS out of rejoin
> any more). Each node runs a MON and three OSDs, furthermore a single one of
> those nodes is running one instance of MDS.
>
> Then there are 8 clients, running debian wheezy as well, with linux-3.9 from
> debian backports, mounting cephfs subdir 'home' as /home using kernel client
> (I know 3.9 is rather old for that, but I found no way to mount a subdir of
> cephfs from fstab using ceph-fuse).
> My clients' fstab entry looks something like that:
> 172.16.17.3:6789:/home/          /home           ceph
> name=admin,secretfile=/some/secret/file     0       0
>
> On a first look, I couldn't find something similar on ther tracker, anyone
> experiencing similar issues?
>
> Btw. restarting MDS gives me some headache every time, as it tends to refuse
> to restart (reporting beeing active but going into some cache cleanup
> endless loop and not answering fs client requests), and the only thing to
> get it up again is to increase mds cache size. I ended up with about 2M
> cache size, which used so much memory that restarts were neccesary about
> twice a day. So I shut down MDS over the weekend and after about 40 hours I
> was able to start up MDS again with about 250k cache size. Maybe that
> information is of some help for you.
>

The bug has been fixed in 3.11 kernel by commit ccca4e37b1 (libceph:
fix truncate size calculation). We don't backport cephfs bug fixes to
old kernel.
please update the kernel or use ceph-fuse.

Regards
Yan, Zheng

> Best regards,
> Tobi
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com