I'm using 3.13.0-35-generic on Ubuntu 14.04.1 On Thu, Sep 4, 2014 at 6:08 PM, Yan, Zheng <ukernel at gmail.com> wrote: > On Fri, Sep 5, 2014 at 3:24 AM, James Devine <fxmulder at gmail.com> wrote: > > It took a week to happen again, I had hopes that it was fixed but alas > it is > > not. Looking at top logs on the active mds server, the load average was > > 0.00 the whole time and memory usage never changed much, it is using > close > > to 100% and some swap but since I changed memory.swappiness swap usage > > hasn't gone up but has been slowly coming back down. Same symptoms, the > > mount on the client is unresponsive and a cat on > > /sys/kernel/debug/ceph/*/mdsc had a whole list of entries. A umount and > > remount seems to fix it. > > > > which version of kernel do you use ? > > Yan, Zheng > > > > > On Fri, Aug 29, 2014 at 11:26 AM, James Devine <fxmulder at gmail.com> > wrote: > >> > >> I am running active/standby and it didn't swap over to the standby. If > I > >> shutdown the active server it swaps to the standby fine though. When > there > >> were issues, disk access would back up on the webstats servers and a > cat of > >> /sys/kernel/debug/ceph/*/mdsc would have a list of entries whereas > normally > >> it would only list one or two if any. I have 4 cores and 2GB of ram on > the > >> mds machines. Watching it right now it is using most of the ram and > some of > >> swap although most of the active ram is disk cache. I lowered the > >> memory.swappiness value to see if that helps. I'm also logging top > output > >> if it happens again. > >> > >> > >> On Thu, Aug 28, 2014 at 8:22 PM, Yan, Zheng <ukernel at gmail.com> wrote: > >>> > >>> On Fri, Aug 29, 2014 at 8:36 AM, James Devine <fxmulder at gmail.com> > wrote: > >>> > > >>> > On Thu, Aug 28, 2014 at 1:30 PM, Gregory Farnum <greg at inktank.com> > >>> > wrote: > >>> >> > >>> >> On Thu, Aug 28, 2014 at 10:36 AM, Brian C. Huffman > >>> >> <bhuffman at etinternational.com> wrote: > >>> >> > Is Ceph Filesystem ready for production servers? > >>> >> > > >>> >> > The documentation says it's not, but I don't see that mentioned > >>> >> > anywhere > >>> >> > else. > >>> >> > http://ceph.com/docs/master/cephfs/ > >>> >> > >>> >> Everybody has their own standards, but Red Hat isn't supporting it > for > >>> >> general production use at this time. If you're brave you could test > it > >>> >> under your workload for a while and see how it comes out; the known > >>> >> issues are very much workload-dependent (or just general concerns > over > >>> >> polish). > >>> >> -Greg > >>> >> Software Engineer #42 @ http://inktank.com | http://ceph.com > >>> >> _______________________________________________ > >>> >> ceph-users mailing list > >>> >> ceph-users at lists.ceph.com > >>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>> > > >>> > > >>> > > >>> > I've been testing it with our webstats since it gets live hits but > >>> > isn't > >>> > customer affecting. Seems the MDS server has problems every few days > >>> > requiring me to umount and remount the ceph disk to resolve. Not > sure > >>> > if > >>> > the issue is resolved in development versions but as of 0.80.5 we > seem > >>> > to be > >>> > hitting it. I set the log verbosity to 20 so there's tons of logs > but > >>> > ends > >>> > with > >>> > >>> The cephfs client is supposed to be able to handle MDS takeover. > >>> what's symptom makes you umount and remount the cephfs ? > >>> > >>> > > >>> > 2014-08-24 07:10:19.682015 7f2b575e7700 10 mds.0.14 laggy, deferring > >>> > client_request(client.92141:6795587 getattr pAsLsXsFs #10000026dc1) > >>> > 2014-08-24 07:10:19.682021 7f2b575e7700 5 mds.0.14 is_laggy > 19.324963 > >>> > > 15 > >>> > since last acked beacon > >>> > 2014-08-24 07:10:20.358011 7f2b554e2700 10 mds.0.14 beacon_send > >>> > up:active > >>> > seq 127220 (currently up:active) > >>> > 2014-08-24 07:10:21.515899 7f2b575e7700 5 mds.0.14 is_laggy > 21.158841 > >>> > > 15 > >>> > since last acked beacon > >>> > 2014-08-24 07:10:21.515912 7f2b575e7700 10 mds.0.14 laggy, deferring > >>> > client_session(request_renewcaps seq 26766) > >>> > 2014-08-24 07:10:21.515915 7f2b575e7700 5 mds.0.14 is_laggy > 21.158857 > >>> > > 15 > >>> > since last acked beacon > >>> > 2014-08-24 07:10:21.981148 7f2b575e7700 10 mds.0.snap check_osd_map > >>> > need_to_purge={} > >>> > 2014-08-24 07:10:21.981176 7f2b575e7700 5 mds.0.14 is_laggy > 21.624117 > >>> > > 15 > >>> > since last acked beacon > >>> > 2014-08-24 07:10:23.170528 7f2b575e7700 5 mds.0.14 handle_mds_map > >>> > epoch 93 > >>> > from mon.0 > >>> > 2014-08-24 07:10:23.175367 7f2b532d5700 0 -- > 10.251.188.124:6800/985 > >>> > >> > >>> > 10.251.188.118:0/2461578479 pipe(0x5588a80 sd=23 :6800 s=2 pgs=91 > cs=1 > >>> > l=0 > >>> > c=0x2cbfb20).fault with nothing to send, going to standby > >>> > 2014-08-24 07:10:23.175376 7f2b533d6700 0 -- > 10.251.188.124:6800/985 > >>> > >> > >>> > 10.251.188.55:0/306923677 pipe(0x5588d00 sd=22 :6800 s=2 pgs=7 cs=1 > l=0 > >>> > c=0x2cbf700).fault with nothing to send, going to standby > >>> > 2014-08-24 07:10:23.175380 7f2b531d4700 0 -- > 10.251.188.124:6800/985 > >>> > >> > >>> > 10.251.188.31:0/2854230502 pipe(0x5589480 sd=24 :6800 s=2 pgs=881 > cs=1 > >>> > l=0 > >>> > c=0x2cbfde0).fault with nothing to send, going to standby > >>> > 2014-08-24 07:10:23.175438 7f2b534d7700 0 -- > 10.251.188.124:6800/985 > >>> > >> > >>> > 10.251.188.68:0/2928927296 pipe(0x5588800 sd=21 :6800 s=2 pgs=7 cs=1 > >>> > l=0 > >>> > c=0x2cbf5a0).fault with nothing to send, going to standby > >>> > 2014-08-24 07:10:23.184201 7f2b575e7700 10 mds.0.14 my compat > >>> > compat={},rocompat={},incompat={1=base v0.20,2=client writeable > >>> > ranges,3=default file layouts on dirs,4=dir inode in separate > >>> > object,5=mds > >>> > uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline > >>> > data} > >>> > 2014-08-24 07:10:23.184255 7f2b575e7700 10 mds.0.14 mdsmap compat > >>> > compat={},rocompat={},incompat={1=base v0.20,2=client writeable > >>> > ranges,3=default file layouts on dirs,4=dir inode in separate > >>> > object,5=mds > >>> > uses versioned encoding,6=dirfrag is stored in omap} > >>> > 2014-08-24 07:10:23.184264 7f2b575e7700 10 mds.-1.-1 map says i am > >>> > 10.251.188.124:6800/985 mds.-1.-1 state down:dne > >>> > 2014-08-24 07:10:23.184275 7f2b575e7700 10 mds.-1.-1 peer mds gid > >>> > 94665 > >>> > removed from map > >>> > 2014-08-24 07:10:23.184282 7f2b575e7700 1 mds.-1.-1 handle_mds_map i > >>> > (10.251.188.124:6800/985) dne in the mdsmap, respawning myself > >>> > 2014-08-24 07:10:23.184284 7f2b575e7700 1 mds.-1.-1 respawn > >>> > 2014-08-24 07:10:23.184286 7f2b575e7700 1 mds.-1.-1 e: > >>> > '/usr/bin/ceph-mds' > >>> > 2014-08-24 07:10:23.184288 7f2b575e7700 1 mds.-1.-1 0: > >>> > '/usr/bin/ceph-mds' > >>> > 2014-08-24 07:10:23.184289 7f2b575e7700 1 mds.-1.-1 1: '-i' > >>> > 2014-08-24 07:10:23.184290 7f2b575e7700 1 mds.-1.-1 2: > >>> > 'ceph-cluster1-mds2' > >>> > 2014-08-24 07:10:23.184291 7f2b575e7700 1 mds.-1.-1 3: '--pid-file' > >>> > 2014-08-24 07:10:23.184292 7f2b575e7700 1 mds.-1.-1 4: > >>> > '/var/run/ceph/mds.ceph-cluster1-mds2.pid' > >>> > 2014-08-24 07:10:23.184293 7f2b575e7700 1 mds.-1.-1 5: '-c' > >>> > 2014-08-24 07:10:23.184294 7f2b575e7700 1 mds.-1.-1 6: > >>> > '/etc/ceph/ceph.conf' > >>> > 2014-08-24 07:10:23.184295 7f2b575e7700 1 mds.-1.-1 7: '--cluster' > >>> > 2014-08-24 07:10:23.184296 7f2b575e7700 1 mds.-1.-1 8: 'ceph' > >>> > 2014-08-24 07:10:23.274640 7f2b575e7700 1 mds.-1.-1 exe_path > >>> > /usr/bin/ceph-mds > >>> > 2014-08-24 07:10:23.606875 7f4c55abb800 0 ceph version 0.80.5 > >>> > (38b73c67d375a2552d8ed67843c8a65c2c0feba6), process ceph-mds, pid 987 > >>> > 2014-08-24 07:10:49.024862 7f4c506ad700 1 mds.-1.0 handle_mds_map > >>> > standby > >>> > 2014-08-24 07:10:49.199676 7f4c506ad700 0 mds.-1.0 handle_mds_beacon > >>> > no > >>> > longer laggy > >>> > 2014-08-24 07:10:50.215240 7f4c506ad700 1 mds.-1.0 handle_mds_map > >>> > standby > >>> > 2014-08-24 07:10:51.290407 7f4c506ad700 1 mds.-1.0 handle_mds_map > >>> > standby > >>> > > >>> > > >>> > >>> Did you use active/standby MDS setup? Did the MDS use lots of memory > >>> before it crashed? > >>> > >>> Regards > >>> Yan, Zheng > >>> > >>> > > >>> > _______________________________________________ > >>> > ceph-users mailing list > >>> > ceph-users at lists.ceph.com > >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>> > > >> > >> > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users at lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140904/0f516297/attachment.htm>