Failed OSD has 29 Slow MDS Ops.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

Nautilus 14.2.16

I had an OSD go bad about 10 days ago.  Apparently as it was going down
some MDS ops got hung up waiting for it to come back.  I was out of town
for a couple days and found the OSD 'Down and Out' when I checked in.
(Also, oddly, the cluster did not appear to initiate recovery right away -
it took until I rebooted the OSD node.)

As of right now, the damaged OSD is 'safe-to-destroy' but the slow ops are
still hanging around.  Earlier today I quiesced the clients that were
accessing the CephFS, then unmounted and re-mounted it.  However, this did
not clear the lingering ops.

When I had the node offline I verified that the HDD and NVMe associated
with the OSD seem to actually be healthy, so I plan to zap and re-deploy
using the same hardware.  I would also like to upgrade to 14.2.20 (latest
Ceph for debian 10), but I'm hesitant to do any of this until I get rid of
these 29 slow ops.

Can anybody suggest a path forward?

Thanks.

-Dave

--
Dave Hall
Binghamton University
kdhall@xxxxxxxxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux