I think I do not understand you completely. How long does a live migration take? If I do virsh migrate with vm's on librbd it is a few seconds. I guess this is mainly caused by copying the ram to the other host. Any time more this takes in case of a host failure, is related to time out settings, failure detection and locks being released etc. -----Original Message----- From: Mathieu Dupré [mailto:mathieu.dupre@xxxxxxxxxxxxxxxxxxxx] Sent: dinsdag 6 oktober 2020 9:40 To: ceph-users@xxxxxxx Cc: Bail, Eloi Subject: Write access delay after OSD & Mon lost Hi everybody, Our need is to do VM failover using an image disk over RBD to avoid data loss.We want to limit the downtime as much as possible. We have: - Two hypervisors with a Ceph Monitor and a Ceph OSD. - A third machine with a Ceph Monitor and a Ceph Manager. VM are running over qemu.The VM disks are on a "replicated" rbd pool formed by the two OSDs.Ceph version: NautilusDistribution: Yocto Zeus The following test is performed: we electrically turn off one hypervisor (and therefore a Ceph Monitor and a Ceph OSD), which causes its VMs to switch to the second hypervisor. My main issue is that the mount time of a partition in rw is very slow in the case of a failover (after the loss of an OSD its monitor). With failover we can write on the device after ~25s:[ 25.609074] EXT4-fs (vda3): mounted filesystem with ordered data mode. Opts: (null) In normal boot we can write on the device after ~4s:[ 3.087412] EXT4-fs (vda3): mounted filesystem with ordered data mode. Opts: (null) I wasn't able to reduce this time by tweaking Ceph settings. I am wondering if someone could help me on that. Here is our configuration. ceph.conf[global] fsid = fa7a17d1-5351-459e-bf0e-07e7edc9a625 mon initial members = hypervisor1,hypervisor2,observer mon host = 192.168.217.131,192.168.217.132,192.168.217.133 public network = 192.168.217.0/24 auth cluster required = cephx auth service required = cephx auth client required = cephx osd journal size = 1024 osd pool default size = 2 osd pool default min size = 1 osd crush chooseleaf type = 1 mon osd adjust heartbeat grace = false mon osd min down reporters = 1[mon.hypervisor1] host = hypervisor1 mon addr = 192.168.217.131:6789[mon.hypervisor2] host = hypervisor2 mon addr = 192.168.217.132:6789[mon.observer] host = observer mon addr = 192.168.217.133:6789[osd.0] host = hypervisor1 public_addr = 192.168.217.131 cluster_addr = 192.168.217.131[osd.1] host = hypervisor2 public_addr = 192.168.217.132 cluster_addr = 192.168.217.13 # ceph config dump WHO MASK LEVEL OPTION VALUE RO global advanced mon_osd_adjust_down_out_interval false global advanced mon_osd_adjust_heartbeat_grace false global advanced mon_osd_down_out_interval 5 global advanced mon_osd_report_timeout 4 global advanced osd_beacon_report_interval 1 global advanced osd_heartbeat_grace 2 global advanced osd_heartbeat_interval 1 global advanced osd_mon_ack_timeout 1.000000 global advanced osd_mon_heartbeat_interval 2 global advanced osd_mon_report_interval 3 Thanks _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx