On Mon, 17 Nov 2014, Kevin Sumner wrote: I?ve got a test cluster together with a ~500 OSDs and, 5 MON, and 1 MDS. All the OSDs also mount CephFS at /ceph. I?ve got Graphite pointing at a space under /ceph. Over the weekend, I drove almost 2 million metrics, each of which creates a ~3MB file in a hierarchical path, each sending a datapoint into the metric file once a minute. CephFS seemed to handle the writes ok while I was driving load. All files containing each metric are at paths like this: /ceph/whisper/sandbox/cephtest-osd0013/2/3/4/5.wsp
Today, however, with the load generator still running, reading metadata of files (e.g. directory entries and stat(2) info) in the filesystem (presumably MDS-managed data) seems nearly impossible, especially deeper into the tree. For example, in a shell cd seems to work but ls hangs, seemingly indefinitely. After turning off the load generator and allowing a while for things to settle down, everything seems to behave better.
ceph status and ceph health both return good statuses the entire time. During load generation, the ceph-mds process seems pegged at between 100% and 150%, but with load generation turned off, the process has some high variability from near-idle up to similar 100-150% CPU.
Hopefully, I?ve missed something in the CephFS tuning. However, I?m looking for direction on figuring out if it is, indeed, a tuning problem or if this behavior is a symptom of the ?not ready for production? banner in the documentation.
My first guess is that the MDS cache is just too small and it is thrashing. Try ceph mds tell 0 injectargs '--mds-cache-size 1000000' That's 10x bigger than the default, tho be aware that it will eat up 10x as much RAM too. We've also seen teh cache behave in a non-optimal way when evicting things, making it thrash more often than it should. I'm hoping we can implement something like MQ instead of our two-level LRU, but it isn't high on the priority list right now. sage
Thanks! I’ll pursue mds cache size tuning. Is there any guidance on setting the cache and other mds tunables correctly, or is it an adjust-and-test sort of thing? Cursory searching doesn’t return any relevant documentation for ceph.com. I’m plowing through some other list posts now. |