Blocking/Stuck/Corrupted files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
we started upgrading our Ceph Cluster consisting of 7 Nodes from quincy to reef two days ago. This included the upgrade of the underlying OS and several other small changes. After hitting the osd_remove_queue bug, we could recover mostly, but are still in a non-healthy state because of changes on our network. (See attached image)
But overall we can mount the filesystem on all nodes, read and write.


The Problem now is that at least one file created by slurmctld exist which seems to be somehow compromised during this. In addition there a several more files which are currently unidentified but noticable through stuck container executions. Each file cannot be read or removed on any of the storage or working nodes. All operations (cat, rm, less, ...) are stuck until they are forcefully terminated. I checked all nodes manually and could not identify any process with an open handle to the file.

A manual deep scrub of all pgs and normal scrub does not show any further problems

If possible we would like to identify and unblock or remove the compromised files.



Cheers
Dominik
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux