Cluster always in WARN state, failing to respond to cache pressure

Cullen King <cullen@xxxxxxxxxxxxxxx> · Tue, 12 May 2015 12:03:43 -0700

I'm operating a fairly small ceph cluster, currently three nodes (with plans to expand to five in the next couple of months) with more than adequate hardware. Node specs:
2x Xeon E5-2630
64gb ram
2x RAID1 SSD for system
2x 256gb SSDs for journals
4x 4tb drives for OSDs
1GbE for frontend (shared with rest of my app servers, etc)
10GbE switch for cluster (only used for ceph storage nodes)

I am using CephFS along with the object store w/ RadosGW in front of it. My problems existed when using only CephFS. I use CephFS as a shared datastore for two low volume OSM map tile servers to have a shared tile cache. Usage isn't heavy, it's mostly read. Here's a typical output from ceph status:

https://gist.github.com/kingcu/499c3d9373726e5c7a95

Here's my current ceph.conf:

https://gist.github.com/kingcu/78ab0fe8669b7acb120c

I've upped the mds cache size as recommended by some historical correspondence on the mailing list, which helped for a while. There doesn't seem to be any real issue with the cluster operating in this WARN state, as it has been in production for a couple months now without issue. I'm starting to migrate other data into the Ceph cluster, and before making the final plunge with critical data, wanted to get a handle on this issue. Suggestions are appreciated!

Cheers,

Cullen
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com