> Op 22 augustus 2016 om 15:52 schreef Christian Balzer <chibi@xxxxxxx>: > > > > Hello, > > first off, not a CephFS user, just installed it on a lab setup for fun. > That being said, I tend to read most posts here. > > And I do remember participating in similar discussions. > > On Mon, 22 Aug 2016 14:47:38 +0200 Burkhard Linke wrote: > > > Hi, > > > > we are running CephFS with about 70TB data, > 5 million files and about > > 100 clients. The MDS is currently colocated on a storage box with 14 OSD > > (12 HDD, 2SSD). The box has two E52680v3 CPUs and 128 GB RAM. CephFS > > runs fine, but it feels like the metadata operations may need more speed. > > > > Firstly, I wouldn't share share the MDS with a storage/OSD node, a MON > node would make a more "natural" co-location spot. Indeed. I always try to avoid to co-locate anything with the OSDs. > That being said, CPU wise that machine feels vastly overpowered, don't see > more then half of the cores utilized ever for OSD purposes, even in the > most contrived test cases. > > Have you monitored that node with something like atop to get a feel what > tasks are using how much (of a specific) CPU? > > > Excerpt of MDS perf dump: > > "mds": { > > "request": 73389282, > > "reply": 73389282, > > "reply_latency": { > > "avgcount": 73389282, > > "sum": 259696.749971457 > > }, > > "forward": 0, > > "dir_fetch": 4094842, > > "dir_commit": 720085, > > "dir_split": 0, > > "inode_max": 5000000, > > "inodes": 5000065, > > "inodes_top": 320979, > > "inodes_bottom": 530518, > > "inodes_pin_tail": 4148568, > > "inodes_pinned": 4469666, > > "inodes_expired": 60001276, > > "inodes_with_caps": 4468714, > > "caps": 4850520, > > "subtrees": 2, > > "traverse": 92378836, > > "traverse_hit": 75743822, > > "traverse_forward": 0, > > "traverse_discover": 0, > > "traverse_dir_fetch": 1719440, > > "traverse_remote_ino": 33, > > "traverse_lock": 3952, > > "load_cent": 7339063064, > > "q": 0, > > "exported": 0, > > "exported_inodes": 0, > > "imported": 0, > > "imported_inodes": 0 > > },.... > > > > The setup is expected grow, with regards to the amount of stored data > > and the number of clients. The MDS process currently consumes about 36 > > TB RAM, with 22 TB resident. Since a large part of the MDS run single > > threaded, a CPU with less core and more CPU frequency might be a better > > choice in this setup. > > > I suppose you mean GB up there. ^o^ > > If memory serves me well, there are knobs to control MDS memory usage, so > tuning them upwards may help. > mds_cache_size you mean probably. That's the amount of inodes the MDS will cache at max. Keep in mind, a single inodes uses about 4k of memory. So the default of 100k will consume 400MB of memory. You can increase this to 16.777.216 so it will use about 64GB at max. I would still advise to put 128GB of memory in that machine since the MDS might have a leak at some points and you want to give it some headroom. Source: http://docs.ceph.com/docs/master/cephfs/mds-config-ref/ > And yes to the less cores, more speed rationale. Up to a point of course. Indeed. Faster single-core E5 is better for the MDS than a slower multi-core. > Again, checking with atop should give you a better insight there. > > Also up there you said metadata stuff feels sluggish, have you considered > moving that pool to SSDs? I recall from recent benchmarks that there was no benefit in having the metadata on SSD. Sure, it might help a bit with maybe a journal replay, but I think that regular disks with a proper journal do just fine. > Or in other words, are you sure your MDS needs more speed instead of the > storage below it? > > > How well does the MDS performance scale with CPU frequency (given other > > latency pathes like network I/O don't matter)? Given the amount of > > memory used, does the MDS benefit from larger CPU caches (e.g. E5-2XXX > > class cpu), or a smaller cache in faster CPUs a better choice (e.g. > > E5-1XXX or E3-1XXXv5)? > > > Nothing solid here really, but I'd suspect that faster code execution will > beat larger caches, as I doubt that the (variable) meta-data will fit in > there and be hot enough to benefit from those. > > Christian > > > Regards, > > Burkhard > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > -- > Christian Balzer Network/Systems Engineer > chibi@xxxxxxx Global OnLine Japan/Rakuten Communications > http://www.gol.com/ > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com