You're running 0.87-6. There were various fixes for this problem in Firefly. Were any of these snapshots created on early version of Firefly?
So far, every fix for this issue has gotten developers involved. I'd see if you can talk to some devs on IRC, or post to the ceph-devel mailing list.
My own experience is that I had to delete the affected PGs, and force create them. Hopefully there's a better answer now.
On Fri, Nov 7, 2014 at 8:10 PM, Chu Duc Minh <chu.ducminh@xxxxxxxxx> wrote:
Thank you!After many retry and check log, i guess the reason (2) is the main cause. Because if (1) is the main cause, other OSDs (contain buggy volume/snapshot) will crash too.2. The recovering process can NOT work properly due to the corrupted volumes/snapshot/parent-image/...1. A read/write request to this OSD, but due to the corrupted volume/snapshot/parent-image/..., it crash.One of my OSDs have problems and can NOT be start. I tried to start many times but it always crash few minutes after start.I think about two reasons to make it crash:State of my ceph cluster (just few seconds before crash time):111/57706299 objects degraded (0.001%)
14918 active+clean
1 active+clean+scrubbing+deep
52 active+recovery_wait+degraded
2 active+recovering+degraded
PS: i attach crash-dump log of that OSD in this email for your information.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com