CephFS client issue

Matteo Dacrema <mdacrema@xxxxxxxx> · Sun, 14 Jun 2015 15:26:54 +0000

Hi all,

I'm using CephFS on Hammer and sometimes I need to reboot one or more clients because , as ceph -s tells me, it's "failing to respond to capability release".After that all clients stop to respond: can't access files or mount/umont cephfs.

I've 1.5 million files , 2 metadata servers in active/standby configuration with 8 GB of RAM , 20 clients with 2 GB of RAM each and 2 OSD nodes with 4 80GB osd and 4GB of RAM.

Here my configuration:

[global]
        fsid = 2de7b17f-0a3e-4109-b878-c035dd2f7735
        mon_initial_members = cephmds01
        mon_host = 10.29.81.161
        auth_cluster_required = cephx
        auth_service_required = cephx
        auth_client_required = cephx
        public network = 10.29.81.0/24
        tcp nodelay = true
        tcp rcvbuf = 0
        ms tcp read timeout = 600

        #Capacity
        mon osd full ratio = .95
        mon osd nearfull ratio = .85

[osd]
        osd journal size = 1024
        journal dio = true
        journal aio = true

        osd op threads = 2
        osd op thread timeout = 60
        osd disk threads = 2
        osd recovery threads = 1
        osd recovery max active = 1
        osd max backfills = 2

        # Pool
        osd pool default size = 2

        #XFS
        osd mkfs type = xfs
        osd mkfs options xfs = "-f -i size=2048"
        osd mount options xfs = "rw,noatime,inode64,logbsize=256k,delaylog"

        #FileStore Settings
        filestore xattr use omap = false
        filestore max inline xattr size = 512
        filestore max sync interval = 10
        filestore merge threshold = 40
        filestore split multiple = 8
        filestore flusher = false
        filestore queue max ops = 2000
        filestore queue max bytes = 536870912
        filestore queue committing max ops = 500
        filestore queue committing max bytes = 268435456
        filestore op threads = 2

[mds]
        max mds = 1
        mds cache size = 750000
        client cache size = 2048
        mds dir commit ratio = 0.5

Here ceph -s output:

root@service-new:~# ceph -s
    cluster 2de7b17f-0a3e-4109-b878-c035dd2f7735
     health HEALTH_WARN
            mds0: Client 94102 failing to respond to cache pressure
     monmap e2: 2 mons at {cephmds01=10.29.81.161:6789/0,cephmds02=10.29.81.160:6789/0}
            election epoch 34, quorum 0,1 cephmds02,cephmds01
     mdsmap e79: 1/1/1 up {0=cephmds01=up:active}, 1 up:standby
     osdmap e669: 8 osds: 8 up, 8 in
      pgmap v339741: 256 pgs, 2 pools, 132 GB data, 1417 kobjects
            288 GB used, 342 GB / 631 GB avail
                 256 active+clean
  client io 3091 kB/s rd, 342 op/s

Thank you.

Regards,

Matteo

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com