The issue hasn't popped up since I upgraded the kernel so the issue I was experiencing seems to have been addressed. On Tue, Sep 9, 2014 at 12:13 PM, James Devine <fxmulder at gmail.com> wrote: > The issue isn't so much mounting the ceph client as it is the mounted ceph > client becoming unusable requiring a remount. So far so good though. > > > On Fri, Sep 5, 2014 at 5:53 PM, JIten Shah <jshah2005 at me.com> wrote: > >> We ran into the same issue where we could not mount the filesystem on the >> clients because it had 3.9. Once we upgraded the kernel on the client node, >> we were able to mount it fine. FWIW, you need kernel 3.14 and above. >> >> --jiten >> >> On Sep 5, 2014, at 6:55 AM, James Devine <fxmulder at gmail.com> wrote: >> >> No messages in dmesg, I've updated the two clients to 3.16, we'll see if >> that fixes this issue. >> >> >> On Fri, Sep 5, 2014 at 12:28 AM, Yan, Zheng <ukernel at gmail.com> wrote: >> >>> On Fri, Sep 5, 2014 at 8:42 AM, James Devine <fxmulder at gmail.com> wrote: >>> > I'm using 3.13.0-35-generic on Ubuntu 14.04.1 >>> > >>> >>> Was there any kernel message when the hang happened? We have fixed a >>> few bugs since 3.13 kernel, please use 3.16 kernel if possible. >>> >>> Yan, Zheng >>> >>> > >>> > On Thu, Sep 4, 2014 at 6:08 PM, Yan, Zheng <ukernel at gmail.com> wrote: >>> >> >>> >> On Fri, Sep 5, 2014 at 3:24 AM, James Devine <fxmulder at gmail.com> >>> wrote: >>> >> > It took a week to happen again, I had hopes that it was fixed but >>> alas >>> >> > it is >>> >> > not. Looking at top logs on the active mds server, the load >>> average was >>> >> > 0.00 the whole time and memory usage never changed much, it is using >>> >> > close >>> >> > to 100% and some swap but since I changed memory.swappiness swap >>> usage >>> >> > hasn't gone up but has been slowly coming back down. Same >>> symptoms, the >>> >> > mount on the client is unresponsive and a cat on >>> >> > /sys/kernel/debug/ceph/*/mdsc had a whole list of entries. A >>> umount and >>> >> > remount seems to fix it. >>> >> > >>> >> >>> >> which version of kernel do you use ? >>> >> >>> >> Yan, Zheng >>> >> >>> >> > >>> >> > On Fri, Aug 29, 2014 at 11:26 AM, James Devine <fxmulder at gmail.com> >>> >> > wrote: >>> >> >> >>> >> >> I am running active/standby and it didn't swap over to the >>> standby. If >>> >> >> I >>> >> >> shutdown the active server it swaps to the standby fine though. >>> When >>> >> >> there >>> >> >> were issues, disk access would back up on the webstats servers and >>> a >>> >> >> cat of >>> >> >> /sys/kernel/debug/ceph/*/mdsc would have a list of entries whereas >>> >> >> normally >>> >> >> it would only list one or two if any. I have 4 cores and 2GB of >>> ram on >>> >> >> the >>> >> >> mds machines. Watching it right now it is using most of the ram >>> and >>> >> >> some of >>> >> >> swap although most of the active ram is disk cache. I lowered the >>> >> >> memory.swappiness value to see if that helps. I'm also logging top >>> >> >> output >>> >> >> if it happens again. >>> >> >> >>> >> >> >>> >> >> On Thu, Aug 28, 2014 at 8:22 PM, Yan, Zheng <ukernel at gmail.com> >>> wrote: >>> >> >>> >>> >> >>> On Fri, Aug 29, 2014 at 8:36 AM, James Devine <fxmulder at gmail.com >>> > >>> >> >>> wrote: >>> >> >>> > >>> >> >>> > On Thu, Aug 28, 2014 at 1:30 PM, Gregory Farnum < >>> greg at inktank.com> >>> >> >>> > wrote: >>> >> >>> >> >>> >> >>> >> On Thu, Aug 28, 2014 at 10:36 AM, Brian C. Huffman >>> >> >>> >> <bhuffman at etinternational.com> wrote: >>> >> >>> >> > Is Ceph Filesystem ready for production servers? >>> >> >>> >> > >>> >> >>> >> > The documentation says it's not, but I don't see that >>> mentioned >>> >> >>> >> > anywhere >>> >> >>> >> > else. >>> >> >>> >> > http://ceph.com/docs/master/cephfs/ >>> >> >>> >> >>> >> >>> >> Everybody has their own standards, but Red Hat isn't >>> supporting it >>> >> >>> >> for >>> >> >>> >> general production use at this time. If you're brave you could >>> test >>> >> >>> >> it >>> >> >>> >> under your workload for a while and see how it comes out; the >>> known >>> >> >>> >> issues are very much workload-dependent (or just general >>> concerns >>> >> >>> >> over >>> >> >>> >> polish). >>> >> >>> >> -Greg >>> >> >>> >> Software Engineer #42 @ http://inktank.com | http://ceph.com >>> >> >>> >> _______________________________________________ >>> >> >>> >> ceph-users mailing list >>> >> >>> >> ceph-users at lists.ceph.com >>> >> >>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >>> > >>> >> >>> > >>> >> >>> > >>> >> >>> > I've been testing it with our webstats since it gets live hits >>> but >>> >> >>> > isn't >>> >> >>> > customer affecting. Seems the MDS server has problems every few >>> >> >>> > days >>> >> >>> > requiring me to umount and remount the ceph disk to resolve. >>> Not >>> >> >>> > sure >>> >> >>> > if >>> >> >>> > the issue is resolved in development versions but as of 0.80.5 >>> we >>> >> >>> > seem >>> >> >>> > to be >>> >> >>> > hitting it. I set the log verbosity to 20 so there's tons of >>> logs >>> >> >>> > but >>> >> >>> > ends >>> >> >>> > with >>> >> >>> >>> >> >>> The cephfs client is supposed to be able to handle MDS takeover. >>> >> >>> what's symptom makes you umount and remount the cephfs ? >>> >> >>> >>> >> >>> > >>> >> >>> > 2014-08-24 07:10:19.682015 7f2b575e7700 10 mds.0.14 laggy, >>> >> >>> > deferring >>> >> >>> > client_request(client.92141:6795587 getattr pAsLsXsFs >>> #10000026dc1) >>> >> >>> > 2014-08-24 07:10:19.682021 7f2b575e7700 5 mds.0.14 is_laggy >>> >> >>> > 19.324963 >>> >> >>> > > 15 >>> >> >>> > since last acked beacon >>> >> >>> > 2014-08-24 07:10:20.358011 7f2b554e2700 10 mds.0.14 beacon_send >>> >> >>> > up:active >>> >> >>> > seq 127220 (currently up:active) >>> >> >>> > 2014-08-24 07:10:21.515899 7f2b575e7700 5 mds.0.14 is_laggy >>> >> >>> > 21.158841 >>> >> >>> > > 15 >>> >> >>> > since last acked beacon >>> >> >>> > 2014-08-24 07:10:21.515912 7f2b575e7700 10 mds.0.14 laggy, >>> >> >>> > deferring >>> >> >>> > client_session(request_renewcaps seq 26766) >>> >> >>> > 2014-08-24 07:10:21.515915 7f2b575e7700 5 mds.0.14 is_laggy >>> >> >>> > 21.158857 >>> >> >>> > > 15 >>> >> >>> > since last acked beacon >>> >> >>> > 2014-08-24 07:10:21.981148 7f2b575e7700 10 mds.0.snap >>> check_osd_map >>> >> >>> > need_to_purge={} >>> >> >>> > 2014-08-24 07:10:21.981176 7f2b575e7700 5 mds.0.14 is_laggy >>> >> >>> > 21.624117 >>> >> >>> > > 15 >>> >> >>> > since last acked beacon >>> >> >>> > 2014-08-24 07:10:23.170528 7f2b575e7700 5 mds.0.14 >>> handle_mds_map >>> >> >>> > epoch 93 >>> >> >>> > from mon.0 >>> >> >>> > 2014-08-24 07:10:23.175367 7f2b532d5700 0 -- >>> >> >>> > 10.251.188.124:6800/985 >>> >> >>> > >> >>> >> >>> > 10.251.188.118:0/2461578479 pipe(0x5588a80 sd=23 :6800 s=2 >>> pgs=91 >>> >> >>> > cs=1 >>> >> >>> > l=0 >>> >> >>> > c=0x2cbfb20).fault with nothing to send, going to standby >>> >> >>> > 2014-08-24 07:10:23.175376 7f2b533d6700 0 -- >>> >> >>> > 10.251.188.124:6800/985 >>> >> >>> > >> >>> >> >>> > 10.251.188.55:0/306923677 pipe(0x5588d00 sd=22 :6800 s=2 pgs=7 >>> cs=1 >>> >> >>> > l=0 >>> >> >>> > c=0x2cbf700).fault with nothing to send, going to standby >>> >> >>> > 2014-08-24 07:10:23.175380 7f2b531d4700 0 -- >>> >> >>> > 10.251.188.124:6800/985 >>> >> >>> > >> >>> >> >>> > 10.251.188.31:0/2854230502 pipe(0x5589480 sd=24 :6800 s=2 >>> pgs=881 >>> >> >>> > cs=1 >>> >> >>> > l=0 >>> >> >>> > c=0x2cbfde0).fault with nothing to send, going to standby >>> >> >>> > 2014-08-24 07:10:23.175438 7f2b534d7700 0 -- >>> >> >>> > 10.251.188.124:6800/985 >>> >> >>> > >> >>> >> >>> > 10.251.188.68:0/2928927296 pipe(0x5588800 sd=21 :6800 s=2 >>> pgs=7 cs=1 >>> >> >>> > l=0 >>> >> >>> > c=0x2cbf5a0).fault with nothing to send, going to standby >>> >> >>> > 2014-08-24 07:10:23.184201 7f2b575e7700 10 mds.0.14 my >>> compat >>> >> >>> > compat={},rocompat={},incompat={1=base v0.20,2=client writeable >>> >> >>> > ranges,3=default file layouts on dirs,4=dir inode in separate >>> >> >>> > object,5=mds >>> >> >>> > uses versioned encoding,6=dirfrag is stored in omap,7=mds uses >>> >> >>> > inline >>> >> >>> > data} >>> >> >>> > 2014-08-24 07:10:23.184255 7f2b575e7700 10 mds.0.14 mdsmap >>> compat >>> >> >>> > compat={},rocompat={},incompat={1=base v0.20,2=client writeable >>> >> >>> > ranges,3=default file layouts on dirs,4=dir inode in separate >>> >> >>> > object,5=mds >>> >> >>> > uses versioned encoding,6=dirfrag is stored in omap} >>> >> >>> > 2014-08-24 07:10:23.184264 7f2b575e7700 10 mds.-1.-1 map says i >>> am >>> >> >>> > 10.251.188.124:6800/985 mds.-1.-1 state down:dne >>> >> >>> > 2014-08-24 07:10:23.184275 7f2b575e7700 10 mds.-1.-1 peer mds >>> gid >>> >> >>> > 94665 >>> >> >>> > removed from map >>> >> >>> > 2014-08-24 07:10:23.184282 7f2b575e7700 1 mds.-1.-1 >>> handle_mds_map >>> >> >>> > i >>> >> >>> > (10.251.188.124:6800/985) dne in the mdsmap, respawning myself >>> >> >>> > 2014-08-24 07:10:23.184284 7f2b575e7700 1 mds.-1.-1 respawn >>> >> >>> > 2014-08-24 07:10:23.184286 7f2b575e7700 1 mds.-1.-1 e: >>> >> >>> > '/usr/bin/ceph-mds' >>> >> >>> > 2014-08-24 07:10:23.184288 7f2b575e7700 1 mds.-1.-1 0: >>> >> >>> > '/usr/bin/ceph-mds' >>> >> >>> > 2014-08-24 07:10:23.184289 7f2b575e7700 1 mds.-1.-1 1: '-i' >>> >> >>> > 2014-08-24 07:10:23.184290 7f2b575e7700 1 mds.-1.-1 2: >>> >> >>> > 'ceph-cluster1-mds2' >>> >> >>> > 2014-08-24 07:10:23.184291 7f2b575e7700 1 mds.-1.-1 3: >>> >> >>> > '--pid-file' >>> >> >>> > 2014-08-24 07:10:23.184292 7f2b575e7700 1 mds.-1.-1 4: >>> >> >>> > '/var/run/ceph/mds.ceph-cluster1-mds2.pid' >>> >> >>> > 2014-08-24 07:10:23.184293 7f2b575e7700 1 mds.-1.-1 5: '-c' >>> >> >>> > 2014-08-24 07:10:23.184294 7f2b575e7700 1 mds.-1.-1 6: >>> >> >>> > '/etc/ceph/ceph.conf' >>> >> >>> > 2014-08-24 07:10:23.184295 7f2b575e7700 1 mds.-1.-1 7: >>> '--cluster' >>> >> >>> > 2014-08-24 07:10:23.184296 7f2b575e7700 1 mds.-1.-1 8: 'ceph' >>> >> >>> > 2014-08-24 07:10:23.274640 7f2b575e7700 1 mds.-1.-1 exe_path >>> >> >>> > /usr/bin/ceph-mds >>> >> >>> > 2014-08-24 07:10:23.606875 7f4c55abb800 0 ceph version 0.80.5 >>> >> >>> > (38b73c67d375a2552d8ed67843c8a65c2c0feba6), process ceph-mds, >>> pid >>> >> >>> > 987 >>> >> >>> > 2014-08-24 07:10:49.024862 7f4c506ad700 1 mds.-1.0 >>> handle_mds_map >>> >> >>> > standby >>> >> >>> > 2014-08-24 07:10:49.199676 7f4c506ad700 0 mds.-1.0 >>> >> >>> > handle_mds_beacon >>> >> >>> > no >>> >> >>> > longer laggy >>> >> >>> > 2014-08-24 07:10:50.215240 7f4c506ad700 1 mds.-1.0 >>> handle_mds_map >>> >> >>> > standby >>> >> >>> > 2014-08-24 07:10:51.290407 7f4c506ad700 1 mds.-1.0 >>> handle_mds_map >>> >> >>> > standby >>> >> >>> > >>> >> >>> > >>> >> >>> >>> >> >>> Did you use active/standby MDS setup? Did the MDS use lots of >>> memory >>> >> >>> before it crashed? >>> >> >>> >>> >> >>> Regards >>> >> >>> Yan, Zheng >>> >> >>> >>> >> >>> > >>> >> >>> > _______________________________________________ >>> >> >>> > ceph-users mailing list >>> >> >>> > ceph-users at lists.ceph.com >>> >> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >>> > >>> >> >> >>> >> >> >>> >> > >>> >> > >>> >> > _______________________________________________ >>> >> > ceph-users mailing list >>> >> > ceph-users at lists.ceph.com >>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> > >>> > >>> > >>> > >>> > _______________________________________________ >>> > ceph-users mailing list >>> > ceph-users at lists.ceph.com >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> > >>> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users at lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140929/c7b6b0c5/attachment.htm>