OSD Restart results in "unfound objects"

Diego Castro <diego.castro@xxxxxxxxxxxxxx> · Wed, 1 Jun 2016 05:25:22 -0300

Hello, i have a cluster running Jewel 10.2.0, 25 OSD's + 4 Mon.Today my cluster suddenly went unhealth with lots of stuck pg's  due unfound objects, no disks failures nor node crashes, it just went bad.

I managed to put the cluster on health state again by marking lost objects to delete "ceph pg <id> mark_unfound_lost delete". 
Regarding the fact that i have no idea why the cluster gone bad, i realized restarting the osd' daemons to unlock stuck clients put the cluster on unhealth and pg gone stuck again due unfound objects.

Does anyone have this issue?

---
Diego Castro / The CloudFather
GetupCloud.com - Eliminamos a Gravidade

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com