On Mon, Sep 9, 2013 at 3:29 PM, Tobias Prousa <topro@xxxxxx> wrote: > Hi Ceph, > > I recently realized that whenever I'm forced to restart MDS (i.e. stall or > crash due to execcive memory consumption, btw. my MDS host has 32GB of RAM) > especially while there are still clients having CephFS mounted, open files > tend to have their metadata corrupted. Those files, when corrupted, will > uniquely report a file size of exactly 4MiB, no matter what the real file > size was. The rest of metadata like name, date, ... seems to be ok. I'm not > 100% sure this is directly related to MDS restart but it obviously gives me > the impression. Also the files that get corrupted are those that are highly > likely open or have been written to most recently. I cannot see anything > suspect on the logfiles, either. > > Some details on my setup: > > As servers there are 3 nodes running debian wheezy with ceph dumpling > (0.67.2-35-g17a7342 from guibuilder, as 0.67.2 didn't get MDS out of rejoin > any more). Each node runs a MON and three OSDs, furthermore a single one of > those nodes is running one instance of MDS. > > Then there are 8 clients, running debian wheezy as well, with linux-3.9 from > debian backports, mounting cephfs subdir 'home' as /home using kernel client > (I know 3.9 is rather old for that, but I found no way to mount a subdir of > cephfs from fstab using ceph-fuse). > My clients' fstab entry looks something like that: > 172.16.17.3:6789:/home/ /home ceph > name=admin,secretfile=/some/secret/file 0 0 > > On a first look, I couldn't find something similar on ther tracker, anyone > experiencing similar issues? > > Btw. restarting MDS gives me some headache every time, as it tends to refuse > to restart (reporting beeing active but going into some cache cleanup > endless loop and not answering fs client requests), and the only thing to > get it up again is to increase mds cache size. I ended up with about 2M > cache size, which used so much memory that restarts were neccesary about > twice a day. So I shut down MDS over the weekend and after about 40 hours I > was able to start up MDS again with about 250k cache size. Maybe that > information is of some help for you. > The bug has been fixed in 3.11 kernel by commit ccca4e37b1 (libceph: fix truncate size calculation). We don't backport cephfs bug fixes to old kernel. please update the kernel or use ceph-fuse. Regards Yan, Zheng > Best regards, > Tobi > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com