Re: RBD images can't be mapped anymore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

can you please provide more information?
Which other flags did you set (noout should be sufficient, or just use the maintenance mode)?
Please share the output from:

ceph osd tree
ceph osd df
ceph osd pool ls detail

Add the corresponding crush rule which applies to the affected pool.

Zitat von Daniele Rimoldi <daniele.rimoldi@xxxxxxxxx>:

Dear All,

in the last few days we've been facing a strange problem with RBD mapping
in our 5 host cluster.

The cluster is running since 12 months and 2 weeks ago was updated from
Quincy to Reef with no problems.

Saturday, we decided to shut-down one of the 5 nodes di insert a test NVME
drive, we have failure domain set at host level. Before this operation OSD
flags were set (nodown, noout, norebalance... etc).

With one node down, the cluster continued working correctly with one
exception. Many RBD images mapped on various clients stopped working. This
happened across various type of clients so both our external Proxmox
cluster and our windows machines lost these RBD mapped devices.

After bringing back node 5 to the cluster, the problem is still present.
To make things even stranger, cluster is in "HEALTH OK" state and there are
no apparent issues on OSDs.

We then noticed that not all RBD images were lost, but only those created
in pools placed on HDD class devices. There are  a number rbd bench
--io-type write test_hdd --pool=testhddof images in SSD class pools that
are not affected by the problem, so we  momentary moved (cloned) most
important images to SSD pools to have them back and working.

Now I'm seeking community help on how to investigate the problem. We did
create a new HDD class pool with a new image for test purposes, but it
can't be mapped.
In a few tests, mapping in windows succeded but then, the device was
immediately removed because of a stale connection and lack of communication
with the device.

Testing from a cluster node with:
rbd bench --io-type write test_hdd --pool=testhdd
works perfectly so OSDs and cluster seems to be fine... we are quite lost
at this.

Any suggestion on what to check?

Thanks in advance for your help!

Regards,

Daniele
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux