I’ve got a test cluster together with a ~500 OSDs and, 5 MON, and 1 MDS. All the OSDs also mount CephFS at /ceph. I’ve got Graphite pointing at a space under /ceph. Over the weekend, I drove almost 2 million metrics, each of which creates a ~3MB file in a hierarchical path, each sending a datapoint into the metric file once a minute. CephFS seemed to handle the writes ok while I was driving load. All files containing each metric are at paths like this: /ceph/whisper/sandbox/cephtest-osd0013/2/3/4/5.wsp Today, however, with the load generator still running, reading metadata of files (e.g. directory entries and stat(2) info) in the filesystem (presumably MDS-managed data) seems nearly impossible, especially deeper into the tree. For example, in a shell cd seems to work but ls hangs, seemingly indefinitely. After turning off the load generator and allowing a while for things to settle down, everything seems to behave better. ceph status and ceph health both return good statuses the entire time. During load generation, the ceph-mds process seems pegged at between 100% and 150%, but with load generation turned off, the process has some high variability from near-idle up to similar 100-150% CPU. Hopefully, I’ve missed something in the CephFS tuning. However, I’m looking for direction on figuring out if it is, indeed, a tuning problem or if this behavior is a symptom of the “not ready for production” banner in the documentation. |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com