Hello, I’m running kvm virtualization with rbd storage, some images on rbd pool become efficiently unusable after VM restart. All I/O to problematic rbd image blocks infinitely. Checked that it is not a permission or locking problem. The bug was silent until we performed a planned restart of few VMs and some of VMs failed to start (kvm process timed out). It could be related to recent upgrades luminous to nautilus or proxmox 5 to 6. Ceph backend is clean, no observable problems, all mons/mgrs/osds up and running. Network is ok. Nothing in logs relevant to the problem. ceph version 14.2.6 (ba51347bdbe28c7c0e2e9172fa2983111137bb60) nautilus (stable) kernel 5.3.13-2-pve #1 SMP PVE 5.3.13-2 (Fri, 24 Jan 2020 09:49:36 +0100) x86_64 GNU/Linux HEALTH_OK No locks: # rbd status rbd-technet/vm-402-disk-0 Watchers: none # rbd status rbd-technet/vm-402-disk-1 Watchers: none Normal image vs problematic: # rbd object-map check rbd-technet/vm-402-disk-0 Object Map Check: 100% complete…done. # rbd object-map check rbd-technet/vm-402-disk-1 ^C disk-0 is good while disk-1 is effectively lost. Command hangs for many minutes with no visible activity, interrupted. rbd export runs without problems, however some data is lost after being imported back (ext4 errors). rbd deep copy worked for me. Copy looks good, no errors. # rbd info rbd-technet/vm-402-disk-1 rbd image 'vm-402-disk-1': size 16 GiB in 4096 objects order 22 (4 MiB objects) snapshot_count: 0 id: c600d06b8b4567 block_name_prefix: rbd_data.c600d06b8b4567 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, journaling op_features: flags: create_timestamp: Fri Jan 31 17:50:50 2020 access_timestamp: Sat Mar 7 00:30:53 2020 modify_timestamp: Sat Mar 7 00:33:35 2020 journal: c600d06b8b4567 mirroring state: disabled What can be done to debug this problem? Thanks, Ilia. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx