Re: Blocking/Stuck file

Joachim Kraftmayer <joachim.kraftmayer@xxxxxxxxx> · Sun, 15 Sep 2024 10:48:04 +0200

Hi Dominik,

I assume that you are talking about a cephfs problem. To identify the root
cause you have to debug the log file of the mds servers.
Joachim

  joachim.kraftmayer@xxxxxxxxx

  www.clyso.com

  Hohenzollernstr. 27, 80801 Munich

Utting a. A. | HR: Augsburg | HRB: 25866 | USt. ID-Nr.: DE2754306

Am Sa., 14. Sept. 2024 um 14:24 Uhr schrieb dominik.baack <
dominik.baack@xxxxxxxxxxxxxxxxx>:

> Hi,
> we started upgrading our Ceph Cluster consisting of 7 Nodes from quincy
> to reef two days ago. This included the upgrade of the underlying OS and
> several other small changes.
> After hitting the osd_remove_queue bug, we could recover mostly, but are
> still in a non-healthy state because of changes on our network. (See
> attached image)
> But overall we can mount the filesystem on all nodes, read and write.
>
>
> The Problem now is that at least one file created by slurmctld exist
> which seems to be somehow compromised during this. It cannot be read or
> removed on any of the storage or working nodes. All operations (cat, rm,
> less, ...) are stuck until they are forcefully terminated. I checked all
> nodes manually and could not identify any process with an open handle to
> the file.
>
> If possible we would like to unblock the file, but removing would also
> be a possiblity.
>
> Cheers
> Dominik
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx