Making mds cache size 5 million seems to have helped significantly, but we’re still seeing issues occasionally on metadata reads while under load. Settings over 5 million don’t seem to have any noticeable impact on this problem. I’m starting the upgrade to Giant today.
Hi Thomas,
I looked over the mds config reference a bit yesterday, but mds cache size seems to be the most relevant tunable.
As suggested, I upped mds-cache-size to 1 million yesterday and started the load generator. During load generation, we’re seeing similar behavior on the filesystem and the mds. The mds process is running a little hotter now with higher CPU average and 11GB resident size (was just under 10GB iirc). Enumerating files on the filesystem, e.g., with ls, is still hanging though.
With load generation disabled, the behavior is the same as before, i.e., things work ask expected.
I’ve got a lot of memory and CPU headroom on the box hosting the mds, so unless there’s good reason not to, I’m to continue increasing the mds cache iteratively in the hopes of finding a size that produces good behavior. Right now, I’d expect us to hit around 2 million inodes each minute, so cache at 1 million is still undersized. If that doesn’t work, we’re running Firefly on the cluster currently and I’ll be upgrading it to Giant. --
Hi Kevin,There are every (I think) MDS tunables listed on this page with a shortdescription : http://ceph.com/docs/master/cephfs/mds-config-ref/Can you tell us how your cluster behave after the mds-cache-sizechange ? What is your MDS ram consumption, before and after ?Thanks !-- Thomas LemarchandCloud Solutions SAS - Responsable des systèmes d'informationOn lun., 2014-11-17 at 16:06 -0800, Kevin Sumner wrote:On Nov 17, 2014, at 15:52, Sage Weil <sage@xxxxxxxxxxxx> wrote:
On Mon, 17 Nov 2014, Kevin Sumner wrote:
I?ve got a test cluster together with a ~500 OSDs and, 5 MON, and 1 MDS. All the OSDs also mount CephFS at /ceph. I?ve got Graphite pointing at a space under /ceph. Over the weekend, I drove almost 2 million metrics, each of which creates a ~3MB file in a hierarchical path, each sending a datapoint into the metric file once a minute. CephFS seemed to handle the writes ok while I was driving load. All files containing each metric are at paths like this: /ceph/whisper/sandbox/cephtest-osd0013/2/3/4/5.wsp
Today, however, with the load generator still running, reading metadata of files (e.g. directory entries and stat(2) info) in the filesystem (presumably MDS-managed data) seems nearly impossible, especially deeper into the tree. For example, in a shell cd seems to work but ls hangs, seemingly indefinitely. After turning off the load generator and allowing a while for things to settle down, everything seems to behave better.
ceph status and ceph health both return good statuses the entire time. During load generation, the ceph-mds process seems pegged at between 100% and 150%, but with load generation turned off, the process has some high variability from near-idle up to similar 100-150% CPU.
Hopefully, I?ve missed something in the CephFS tuning. However, I?m looking for direction on figuring out if it is, indeed, a tuning problem or if this behavior is a symptom of the ?not ready for production? banner in the documentation.
My first guess is that the MDS cache is just too small and it is thrashing. Try
ceph mds tell 0 injectargs '--mds-cache-size 1000000'
That's 10x bigger than the default, tho be aware that it will eat up 10x as much RAM too.
We've also seen teh cache behave in a non-optimal way when evicting things, making it thrash more often than it should. I'm hoping we can implement something like MQ instead of our two-level LRU, but it isn't high on the priority list right now.
sage
Thanks! I’ll pursue mds cache size tuning. Is there any guidance on setting the cache and other mds tunables correctly, or is it an adjust-and-test sort of thing? Cursory searching doesn’t return any relevant documentation for ceph.com. I’m plowing through some other list posts now. -- Kevin Sumner kevin@xxxxxxxxx
-- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-- This message has been scanned for viruses anddangerous content by MailScanner, and isbelieved to be clean.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxxhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
|