Hello, i have a cluster running Jewel 10.2.0, 25 OSD's + 4 Mon.
Today my cluster suddenly went unhealth with lots of stuck pg's due unfound objects, no disks failures nor node crashes, it just went bad.
I managed to put the cluster on health state again by marking lost objects to delete "ceph pg <id> mark_unfound_lost delete".
Regarding the fact that i have no idea why the cluster gone bad, i realized restarting the osd' daemons to unlock stuck clients put the cluster on unhealth and pg gone stuck again due unfound objects.
Does anyone have this issue?
---
Diego Castro / The CloudFather
GetupCloud.com - Eliminamos a Gravidade
GetupCloud.com - Eliminamos a Gravidade
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com