Hi, yesterday I had to power off some vm (proxmox) backed by rbd images for maintenance. After the VMs were off, I tried to create a Snapshot which didn't finish even after half an hour. Because it was maintenance I rebooted all VM nodes an all ceph nodes - nothing changed. Powering on the VM was impossible, kvm exited with timeout. This happened to two of about 15 VM. Two of three Images of one VM still had locks, which I did remove but still unable to power on. I tried to access the Image by mapping it with rbd-nbd, which was unsuccessful and logged this: [ 8601.746971] block nbd0: Connection timed out [ 8601.747648] block nbd0: shutting down sockets [ 8601.747653] block nbd0: Connection timed out [...] [ 8601.750419] block nbd0: Connection timed out [ 8601.750831] print_req_error: 121 callbacks suppressed [ 8601.750832] blk_update_request: I/O error, dev nbd0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 [ 8601.751261] buffer_io_error: 182 callbacks suppressed [ 8601.751262] Buffer I/O error on dev nbd0, logical block 0, async page read [ 8601.751678] blk_update_request: I/O error, dev nbd0, sector 1 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 [...] [ 8601.760283] ldm_validate_partition_table(): Disk read failed. [ 8601.760344] Dev nbd0: unable to read RDB block 0 [ 8601.760985] nbd0: unable to read partition table [ 8601.761282] nbd0: detected capacity change from 0 to 375809638400 [ 8601.761382] ldm_validate_partition_table(): Disk read failed. [ 8601.761461] Dev nbd0: unable to read RDB block 0 [ 8601.762145] nbd0: unable to read partition table The rbd-nbd process kept existing and had to be killed Same thing with qemu-nbd. Exporting the Image via rbd export worked fine, also a rbd copy. Any other operation on the Image (feature dis / enable) took forever so I had to abort it. It seems that every operation leaves a lock on the image. Because it was in the middle of the night, I stopped working on it. Today Morning one of the images was accessible again, the others not. Anybody any hint? Some system information below. Regards, Yves ceph version 14.2.5 (3ce7517553bdd5195b68a6ffaf0bd7f3acad1647) nautilus (stable) Primary Cluster with a backup cluster (rbd mirror) [global] auth client required = none auth cluster required = none auth service required = none auth supported = none cephx_sign_messages = false cephx require signatures = False cluster_network = 172.16.230.0/24 debug asok = 0/0 debug auth = 0/0 debug bdev = 0/0 debug bluefs = 0/0 debug bluestore = 0/0 debug buffer = 0/0 debug civetweb = 0/0 debug client = 0/0 debug compressor = 0/0 debug context = 0/0 debug crush = 0/0 debug crypto = 0/0 debug dpdk = 0/0 debug eventtrace = 0/0 debug filer = 0/0 debug filestore = 0/0 debug finisher = 0/0 debug fuse = 0/0 debug heartbeatmap = 0/0 debug javaclient = 0/0 debug journal = 0/0 debug journaler = 0/0 debug kinetic = 0/0 debug kstore = 0/0 debug leveldb = 0/0 debug lockdep = 0/0 debug mds = 0/0 debug mds balancer = 0/0 debug mds locker = 0/0 debug mds log = 0/0 debug mds log expire = 0/0 debug mds migrator = 0/0 debug memdb = 0/0 debug mgr = 0/0 debug mgrc = 0/0 debug mon = 0/0 debug monc = 0/00 debug ms = 0/0 debug none = 0/0 debug objclass = 0/0 debug objectcacher = 0/0 debug objecter = 0/0 debug optracker = 0/0 debug osd = 0/0 debug paxos = 0/0 debug perfcounter = 0/0 debug rados = 0/0 debug rbd = 0/0 debug rbd mirror = 0/0 debug rbd replay = 0/0 debug refs = 0/0 debug reserver = 0/0 debug rgw = 0/0 debug rocksdb = 0/0 debug striper = 0/0 debug throttle = 0/0 debug timer = 0/0 debug tp = 0/0 debug xio = 0/0 fsid = 27fdf1bb-22a1-4d5e-9729-780cbdcd33fe mon_allow_pool_delete = true mon_host = 172.16.230.142 172.16.230.144 172.16.230.146 mon_osd_down_out_subtree_limit = host osd_backfill_scan_max = 16 osd_backfill_scan_min = 4 osd_deep_scrub_interval = 1209600 osd_journal_size = 5120 osd_max_backfills = 1 osd_max_trimming_pgs = 1 osd_pg_max_concurrent_snap_trims = 1 osd_pool_default_min_size = 2 osd_pool_default_size = 3 osd_recovery_max_active = 1 osd_recovery_max_single_start = 1 osd_recovery_op_priority = 1 osd_recovery_threads = 1 osd_scrub_begin_hour = 19 osd_scrub_chunk_max = 1 osd_scrub_chunk_min = 1 osd_scrub_during_recovery = false osd_scrub_end_hour = 6 osd_scrub_priority = 1 osd_scrub_sleep = 0.5 osd_snap_trim_priority = 1 osd_snap_trim_sleep = 0.005 osd_srub_max_interval = 1209600 public_network = 172.16.230.0/24 max open files = 131072 osd objectstore = bluestore osd op threads = 2 osd crush update on start = true Currently inaccessible image: rbd image 'vm-29009-disk-2': size 200 GiB in 51200 objects order 22 (4 MiB objects) snapshot_count: 2 id: 1abd04da8b9a4d block_name_prefix: rbd_data.1abd04da8b9a4d format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, journaling op_features: flags: create_timestamp: Tue Jul 9 13:07:36 2019 access_timestamp: Thu Dec 19 01:35:34 2019 modify_timestamp: Thu Dec 19 00:19:32 2019 journal: 1abd04da8b9a4d mirroring state: enabled mirroring global id: c71ec81f-18be-4d0b-93ed-0cebe3e619bb mirroring primary: true _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx