Hi,
we started upgrading our Ceph Cluster consisting of 7 Nodes from quincy
to reef two days ago. This included the upgrade of the underlying OS and
several other small changes.
After hitting the osd_remove_queue bug, we could recover mostly, but are
still in a non-healthy state because of changes on our network. (See
attached image)
But overall we can mount the filesystem on all nodes, read and write.
The Problem now is that at least one file created by slurmctld exist
which seems to be somehow compromised during this. It cannot be read or
removed on any of the storage or working nodes. All operations (cat, rm,
less, ...) are stuck until they are forcefully terminated. I checked all
nodes manually and could not identify any process with an open handle to
the file.
If possible we would like to unblock the file, but removing would also
be a possiblity.
Cheers
Dominik
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx