Hey Folks, - For background, I am working with a small (home-lab) ceph cluster on a proxmox cluster. - Proxmox uses a shared cluster storage to pass configuration files around, including ceph.conf - All nodes are connected with Mellanox connectx-3 (mlx4_core) 56GbE cards connected via qsfp switch. - I am experimenting with Mellanox connectx-4 (mlx5_core) 100GbE cards between two nodes, primarily for fun and to learn, as I can't afford a 100GbE switch for a hobby. -- (The two 100GbE cards are direct attached to one another) - Lastly, ceph is using the (unsupported?) rdma config (`ms_async_transport_type = rdma`, `ms_bind_ipv4 = true`, `ms_cluster_type = async+rdma`) My question: As each ceph osd is communicating with each other by specifically naming the rdma device it should be reached at, for example: ```[osd.0] ms_async_rdma_device_name = rocep8s0``` I am having trouble rationalizing how to allow the two nodes with multiple RDMA devices to communicate with one another, without disrupting the ability for the rest of the cluster to still communicate with the lower throughput rdam device. For example: rdam dev `0: rocep68s0: node_type ca fw 12.16.1020 node_guid 7cfe:9003:0026:88ac sys_image_guid 7cfe:9003:0026:88ac` `1: rocep65s0: node_type ca fw 2.42.5000 node_guid 0002:c903:0042:dff0 sys_image_guid 0002:c903:0042:dff3` If ms_async_rdma_device_name is set to either device, then communication only partial connectivity can occur. What is the correct method for "routing" rdma / RoCE ? Josh _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx