Fwd: Ceph Filesystem - Production?

ukernel@xxxxxxxxx (Yan, Zheng) · Fri, 29 Aug 2014 09:22:23 +0800

On Fri, Aug 29, 2014 at 8:36 AM, James Devine <fxmulder at gmail.com> wrote:
>
> On Thu, Aug 28, 2014 at 1:30 PM, Gregory Farnum <greg at inktank.com> wrote:
>>
>> On Thu, Aug 28, 2014 at 10:36 AM, Brian C. Huffman
>> <bhuffman at etinternational.com> wrote:
>> > Is Ceph Filesystem ready for production servers?
>> >
>> > The documentation says it's not, but I don't see that mentioned anywhere
>> > else.
>> > http://ceph.com/docs/master/cephfs/
>>
>> Everybody has their own standards, but Red Hat isn't supporting it for
>> general production use at this time. If you're brave you could test it
>> under your workload for a while and see how it comes out; the known
>> issues are very much workload-dependent (or just general concerns over
>> polish).
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> I've been testing it with our webstats since it gets live hits but isn't
> customer affecting.  Seems the MDS server has problems every few days
> requiring me to umount and remount the ceph disk to resolve.  Not sure if
> the issue is resolved in development versions but as of 0.80.5 we seem to be
> hitting it.  I set the log verbosity to 20 so there's tons of logs but ends
> with

The cephfs client is supposed to be able to handle MDS takeover.
what's symptom makes you umount and remount the cephfs ?

>
> 2014-08-24 07:10:19.682015 7f2b575e7700 10 mds.0.14  laggy, deferring
> client_request(client.92141:6795587 getattr pAsLsXsFs #10000026dc1)
> 2014-08-24 07:10:19.682021 7f2b575e7700  5 mds.0.14 is_laggy 19.324963 > 15
> since last acked beacon
> 2014-08-24 07:10:20.358011 7f2b554e2700 10 mds.0.14 beacon_send up:active
> seq 127220 (currently up:active)
> 2014-08-24 07:10:21.515899 7f2b575e7700  5 mds.0.14 is_laggy 21.158841 > 15
> since last acked beacon
> 2014-08-24 07:10:21.515912 7f2b575e7700 10 mds.0.14  laggy, deferring
> client_session(request_renewcaps seq 26766)
> 2014-08-24 07:10:21.515915 7f2b575e7700  5 mds.0.14 is_laggy 21.158857 > 15
> since last acked beacon
> 2014-08-24 07:10:21.981148 7f2b575e7700 10 mds.0.snap check_osd_map
> need_to_purge={}
> 2014-08-24 07:10:21.981176 7f2b575e7700  5 mds.0.14 is_laggy 21.624117 > 15
> since last acked beacon
> 2014-08-24 07:10:23.170528 7f2b575e7700  5 mds.0.14 handle_mds_map epoch 93
> from mon.0
> 2014-08-24 07:10:23.175367 7f2b532d5700  0 -- 10.251.188.124:6800/985 >>
> 10.251.188.118:0/2461578479 pipe(0x5588a80 sd=23 :6800 s=2 pgs=91 cs=1 l=0
> c=0x2cbfb20).fault with nothing to send, going to standby
> 2014-08-24 07:10:23.175376 7f2b533d6700  0 -- 10.251.188.124:6800/985 >>
> 10.251.188.55:0/306923677 pipe(0x5588d00 sd=22 :6800 s=2 pgs=7 cs=1 l=0
> c=0x2cbf700).fault with nothing to send, going to standby
> 2014-08-24 07:10:23.175380 7f2b531d4700  0 -- 10.251.188.124:6800/985 >>
> 10.251.188.31:0/2854230502 pipe(0x5589480 sd=24 :6800 s=2 pgs=881 cs=1 l=0
> c=0x2cbfde0).fault with nothing to send, going to standby
> 2014-08-24 07:10:23.175438 7f2b534d7700  0 -- 10.251.188.124:6800/985 >>
> 10.251.188.68:0/2928927296 pipe(0x5588800 sd=21 :6800 s=2 pgs=7 cs=1 l=0
> c=0x2cbf5a0).fault with nothing to send, going to standby
> 2014-08-24 07:10:23.184201 7f2b575e7700 10 mds.0.14      my compat
> compat={},rocompat={},incompat={1=base v0.20,2=client writeable
> ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
> uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data}
> 2014-08-24 07:10:23.184255 7f2b575e7700 10 mds.0.14  mdsmap compat
> compat={},rocompat={},incompat={1=base v0.20,2=client writeable
> ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
> uses versioned encoding,6=dirfrag is stored in omap}
> 2014-08-24 07:10:23.184264 7f2b575e7700 10 mds.-1.-1 map says i am
> 10.251.188.124:6800/985 mds.-1.-1 state down:dne
> 2014-08-24 07:10:23.184275 7f2b575e7700 10 mds.-1.-1  peer mds gid 94665
> removed from map
> 2014-08-24 07:10:23.184282 7f2b575e7700  1 mds.-1.-1 handle_mds_map i
> (10.251.188.124:6800/985) dne in the mdsmap, respawning myself
> 2014-08-24 07:10:23.184284 7f2b575e7700  1 mds.-1.-1 respawn
> 2014-08-24 07:10:23.184286 7f2b575e7700  1 mds.-1.-1  e: '/usr/bin/ceph-mds'
> 2014-08-24 07:10:23.184288 7f2b575e7700  1 mds.-1.-1  0: '/usr/bin/ceph-mds'
> 2014-08-24 07:10:23.184289 7f2b575e7700  1 mds.-1.-1  1: '-i'
> 2014-08-24 07:10:23.184290 7f2b575e7700  1 mds.-1.-1  2:
> 'ceph-cluster1-mds2'
> 2014-08-24 07:10:23.184291 7f2b575e7700  1 mds.-1.-1  3: '--pid-file'
> 2014-08-24 07:10:23.184292 7f2b575e7700  1 mds.-1.-1  4:
> '/var/run/ceph/mds.ceph-cluster1-mds2.pid'
> 2014-08-24 07:10:23.184293 7f2b575e7700  1 mds.-1.-1  5: '-c'
> 2014-08-24 07:10:23.184294 7f2b575e7700  1 mds.-1.-1  6:
> '/etc/ceph/ceph.conf'
> 2014-08-24 07:10:23.184295 7f2b575e7700  1 mds.-1.-1  7: '--cluster'
> 2014-08-24 07:10:23.184296 7f2b575e7700  1 mds.-1.-1  8: 'ceph'
> 2014-08-24 07:10:23.274640 7f2b575e7700  1 mds.-1.-1  exe_path
> /usr/bin/ceph-mds
> 2014-08-24 07:10:23.606875 7f4c55abb800  0 ceph version 0.80.5
> (38b73c67d375a2552d8ed67843c8a65c2c0feba6), process ceph-mds, pid 987
> 2014-08-24 07:10:49.024862 7f4c506ad700  1 mds.-1.0 handle_mds_map standby
> 2014-08-24 07:10:49.199676 7f4c506ad700  0 mds.-1.0 handle_mds_beacon no
> longer laggy
> 2014-08-24 07:10:50.215240 7f4c506ad700  1 mds.-1.0 handle_mds_map standby
> 2014-08-24 07:10:51.290407 7f4c506ad700  1 mds.-1.0 handle_mds_map standby
>
>

Did you use active/standby MDS setup? Did the  MDS use lots of memory
before it crashed?

Regards
Yan, Zheng

>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>