CephFS metadata corruption on MDS restart

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ceph,
 
I recently realized that whenever I'm forced to restart MDS (i.e. stall or crash due to execcive memory consumption, btw. my MDS host has 32GB of RAM) especially while there are still clients having CephFS mounted, open files tend to have their metadata corrupted. Those files, when corrupted, will uniquely report a file size of exactly 4MiB, no matter what the real file size was. The rest of metadata like name, date, ... seems to be ok. I'm not 100% sure this is directly related to MDS restart but it obviously gives me the impression. Also the files that get corrupted are those that are highly likely open or have been written to most recently. I cannot see anything suspect on the logfiles, either.
 
Some details on my setup:
 
As servers there are 3 nodes running debian wheezy with ceph dumpling (0.67.2-35-g17a7342 from guibuilder, as 0.67.2 didn't get MDS out of rejoin any more). Each node runs a MON and three OSDs, furthermore a single one of those nodes is running one instance of MDS.
 
Then there are 8 clients, running debian wheezy as well, with linux-3.9 from debian backports, mounting cephfs subdir 'home' as /home using kernel client (I know 3.9 is rather old for that, but I found no way to mount a subdir of cephfs from fstab using ceph-fuse).
My clients' fstab entry looks something like that:
172.16.17.3:6789:/home/          /home           ceph    name=admin,secretfile=/some/secret/file     0       0
 
On a first look, I couldn't find something similar on ther tracker, anyone experiencing similar issues?
 
Btw. restarting MDS gives me some headache every time, as it tends to refuse to restart (reporting beeing active but going into some cache cleanup endless loop and not answering fs client requests), and the only thing to get it up again is to increase mds cache size. I ended up with about 2M cache size, which used so much memory that restarts were neccesary about twice a day. So I shut down MDS over the weekend and after about 40 hours I was able to start up MDS again with about 250k cache size. Maybe that information is of some help for you.
 
Best regards,
Tobi
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux