Re: Cephfs metadata and MDS on same node

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Fri, 26 Mar 2021 09:14:19 -0700

> On Mar 26, 2021, at 6:31 AM, Stefan Kooman <stefan@xxxxxx> wrote:
> 
> On 3/9/21 4:03 PM, Jesper Lykkegaard Karlsen wrote:
>> Dear Ceph’ers
>> I am about to upgrade MDS nodes for Cephfs in the Ceph cluster (erasure code 8+3 ) I am administrating.
>> Since they will get plenty of memory and CPU cores, I was wondering if it would be a good idea to move metadata OSDs (NVMe's currently on OSD nodes together with cephfs_data ODS (HDD)) to the MDS nodes?
>> Configured as:
>> 4 x MDS with each a metadata OSD and configured with 4 x replication
>> so each metadata OSD would have a complete copy of metadata.
>> I know MDS, stores al lot of metadata in RAM, but if metadata OSDs were on MDS nodes, would that not bring down latency?
>> Anyway, I am just asking for your opinion on this? Pros and cons or even better somebody who actually have tried this?
> 
> I doubt you'll gain a lot from this. Data still has to be replicated, so network latency. And reads would come from the primary OSDs from the cephFS metadata pool. So only if you can make all primary OSDs be on the single active MDS you might have gains. But you will have to do manual tuning with upmap to achieve that.

I think primary affinity would be the way to do this vs upmap fwiw, though the net result might be mixed, since ops will be directed at only 25% of the OSDs.  OSD busy-ness vs network latency.  And as the cluster topology changes one would need to periodically refresh the affinity values.

> I think your money is better spend at buying more NVMe disks and spreading the load than to co-locate that on MDS.

Agreed.  Complex solutions have a way of being more brittle, and of hitting corner cases.  

> If you are planning on multi-active MDS I don't think it would make sense at all.

Unless one provisions multiple filesystems each pinned to an MDS with unique set of OSDs (CRUSH root?) with affinities managed independently?  Not sure if that’s entirely possible; if it is, it’d be an awful lot of complexity.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx