Hi,
Thank you for your input.
We checked the MDS logs, mgr logs and ceph-fuse logs but did not find
much. The E-Mail was stuck several days in transfer, so we found the
solution at the end of last week which was a defect network interface
handling the public trafic on one of our storage nodes.
Cheers
Dominik
Am 2024-09-15 10:48, schrieb Joachim Kraftmayer:
Hi Dominik,
I assume that you are talking about a cephfs problem. To identify the
root cause you have to debug the log file of the mds servers.
Joachim
joachim.kraftmayer@xxxxxxxxx
www.clyso.com [1]
Hohenzollernstr. 27, 80801 Munich
Utting a. A. | HR: Augsburg | HRB: 25866 | USt. ID-Nr.: DE2754306
Am Sa., 14. Sept. 2024 um 14:24 Uhr schrieb dominik.baack
<dominik.baack@xxxxxxxxxxxxxxxxx>:
Hi,
we started upgrading our Ceph Cluster consisting of 7 Nodes from
quincy
to reef two days ago. This included the upgrade of the underlying OS
and
several other small changes.
After hitting the osd_remove_queue bug, we could recover mostly, but
are
still in a non-healthy state because of changes on our network. (See
attached image)
But overall we can mount the filesystem on all nodes, read and
write.
The Problem now is that at least one file created by slurmctld exist
which seems to be somehow compromised during this. It cannot be read
or
removed on any of the storage or working nodes. All operations (cat,
rm,
less, ...) are stuck until they are forcefully terminated. I checked
all
nodes manually and could not identify any process with an open
handle to
the file.
If possible we would like to unblock the file, but removing would
also
be a possiblity.
Cheers
Dominik
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
Links:
------
[1] http://www.clyso.com/
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx