Hello folks! I'm designing a new Ceph storage from scratch and I want to increase CephFS speed and decrease latency. Usually I always build (WAL+DB on NVME with Sas-Sata SSD's) and I deploy MDS and MON's on the same servers. This time a weird idea came to my mind and I think it has great potential and will perform better on paper with my limited knowledge. I have 5 racks and the 3nd "middle" rack is my storage and management rack. - At RACK-3 I'm gonna locate 8x 1u OSD server (Spec: 2x E5-2690V4, 256GB, 4x 25G, 2x 1.6TB PCI-E NVME "MZ-PLK3T20", 8x 4TB SATA SSD) - My Cephfs kernel clients are 40x GPU nodes located at RACK1,2,4,5 With my current workflow, all the clients; 1- visit the rack data switch 2- jump to main VPC switch via 2x100G, 3- talk with MDS servers, 4- Go back to the client with the answer, 5- To access data follow the same HOP's and visit the OSD's everytime. If I deploy separate metadata pool by using 4x MDS server at top of RACK-1,2,4,5 (Spec: 2x E5-2690V4, 128GB, 2x 10G(Public), 2x 25G (cluster), 2x 960GB U.2 NVME "MZ-PLK3T20") Then all the clients will make the request directly in-rack 1 HOP away MDS servers and if the request is only metadata, then the MDS node doesn't need to redirect the request to OSD nodes. Also by locating MDS servers with seperated metadata pool across all the racks will reduce the high load on main VPC switch at RACK-3 If I'm not missing anything then only Recovery workload will suffer with this topology. What do you think? _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx