Troubleshooting hanging storage backend whenever there is any cluster change

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi everyone,

since some time we experience service outages in our Ceph cluster
whenever there is any change to the HEALTH status. E. g. swapping
storage devices, adding storage devices, rebooting Ceph hosts, during
backfills ect.

Just now I had a recent situation, where several VMs hung after I
rebooted one Ceph host. We have 3 replications for each PG, 3 mon, 3
mgr, 3 mds and 71 osds spread over 9 hosts.

We use Ceph as a storage backend for our Proxmox VE (PVE) environment.
The outages are in the form of blocked virtual file systems of those
virtual machines running in our PVE cluster.

It feels similar to stuck and inactive PGs to me. Honestly though I'm
not really sure on how to debug this problem or which log files to examine.

OS: Debian 9
Kernel: 4.12 based upon SLE15-SP1

# ceph version
ceph version 12.2.8-133-gded2f6836f
(ded2f6836f6331a58f5c817fca7bfcd6c58795aa) luminous (stable)

Can someone guide me? I'm more than happy to provide more information
as needed.

Thanks in advance
Nils
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux