Hi guys, There is just ms_type = async+rdma in the document, but there are options not mentioned. I get them using osd config show: ceph config show-with-defaults osd.0 | grep rdma ms_async_rdma_buffer_size 131072 ms_async_rdma_cm false ms_async_rdma_device_name ms_async_rdma_dscp 96 ms_async_rdma_enable_hugepage false ms_async_rdma_gid_idx 0 ms_async_rdma_local_gid ms_async_rdma_polling_us 1000 ms_async_rdma_port_num 1 ms_async_rdma_receive_buffers 32768 ms_async_rdma_receive_queue_len 4096 ms_async_rdma_roce_ver 1 ms_async_rdma_send_buffers 1024 ms_async_rdma_sl 3 ms_async_rdma_support_srq true ms_async_rdma_type ib When I checked Ceph github I found these options with_legacy: true. https://github.com/ceph/ceph/blob/main/src/common/options/global.yaml.in - name: ms_async_rdma_device_name type: str level: advanced with_legacy: true - name: ms_async_rdma_enable_hugepage type: bool level: advanced default: false with_legacy: true - name: ms_async_rdma_buffer_size type: size level: advanced default: 128_K with_legacy: true - name: ms_async_rdma_send_buffers type: uint level: advanced default: 1_K with_legacy: true size of the receive buffer pool, 0 is unlimited - name: ms_async_rdma_receive_buffers type: uint level: advanced default: 32_K with_legacy: true max number of wr in srq - name: ms_async_rdma_receive_queue_len type: uint level: advanced default: 4_K with_legacy: true support srq - name: ms_async_rdma_support_srq type: bool level: advanced default: true with_legacy: true - name: ms_async_rdma_port_num type: uint level: advanced default: 1 with_legacy: true - name: ms_async_rdma_polling_us type: uint level: advanced default: 1000 with_legacy: true - name: ms_async_rdma_gid_idx type: int level: advanced desc: use gid_idx to select GID for choosing RoCEv1 or RoCEv2 default: 0 with_legacy: true GID format: "fe80:0000:0000:0000:7efe:90ff:fe72:6efe", no zero folding - name: ms_async_rdma_local_gid type: str level: advanced with_legacy: true 0=RoCEv1, 1=RoCEv2, 2=RoCEv1.5 - name: ms_async_rdma_roce_ver type: int level: advanced default: 1 with_legacy: true in RoCE, this means PCP - name: ms_async_rdma_sl type: int level: advanced default: 3 with_legacy: true in RoCE, this means DSCP - name: ms_async_rdma_dscp type: int level: advanced default: 96 with_legacy: true when there are enough accept failures, indicating there are unrecoverable failures, just do ceph_abort() . Here we make it configurable. - name: ms_max_accept_failures type: int level: advanced desc: The maximum number of consecutive failed accept() calls before considering the daemon is misconfigured and abort it. default: 4 with_legacy: true rdma connection management - name: ms_async_rdma_cm type: bool level: advanced default: false with_legacy: true - name: ms_async_rdma_type type: str level: advanced default: ib with_legacy: true It causes confusion and The RDMA setup needs more detail in the document. Regards On Mon, Apr 8, 2024 at 10:06 AM Vahideh Alinouri <vahideh.alinouri@xxxxxxxxx> wrote: > > Hi guys, > > I need setup Ceph over RDMA, but I faced many issues! > The info regarding my cluster: > Ceph version is Reef > Network cards are Broadcom RDMA. > RDMA connection between OSD nodes are OK. > > I just found ms_type = async+rdma config in document and apply it using > ceph config set global ms_type async+rdma > After this action the cluster crashes. I tried to cluster back, and I did: > Put ms_type async+posix in ceph.conf > Restart all MON services > > The cluster is back, but I don't have any active mgr. All OSDs are down too. > Is there any order to do for setting up Ceph over RDMA? > Thanks _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx