Dear All, in the last few days we've been facing a strange problem with RBD mapping in our 5 host cluster. The cluster is running since 12 months and 2 weeks ago was updated from Quincy to Reef with no problems. Saturday, we decided to shut-down one of the 5 nodes di insert a test NVME drive, we have failure domain set at host level. Before this operation OSD flags were set (nodown, noout, norebalance... etc). With one node down, the cluster continued working correctly with one exception. Many RBD images mapped on various clients stopped working. This happened across various type of clients so both our external Proxmox cluster and our windows machines lost these RBD mapped devices. After bringing back node 5 to the cluster, the problem is still present. To make things even stranger, cluster is in "HEALTH OK" state and there are no apparent issues on OSDs. We then noticed that not all RBD images were lost, but only those created in pools placed on HDD class devices. There are a number rbd bench --io-type write test_hdd --pool=testhddof images in SSD class pools that are not affected by the problem, so we momentary moved (cloned) most important images to SSD pools to have them back and working. Now I'm seeking community help on how to investigate the problem. We did create a new HDD class pool with a new image for test purposes, but it can't be mapped. In a few tests, mapping in windows succeded but then, the device was immediately removed because of a stale connection and lack of communication with the device. Testing from a cluster node with: rbd bench --io-type write test_hdd --pool=testhdd works perfectly so OSDs and cluster seems to be fine... we are quite lost at this. Any suggestion on what to check? Thanks in advance for your help! Regards, Daniele _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx