OSD Crash for xattr "_" absent issue.

Wenjunh <huangwenjun310@xxxxxxxxx> · Wed, 26 Nov 2014 20:47:36 +0800



> Hi, Samuel & Sage
> 
> In our current production environment, there exists osd crash because of the inconsistence of data, when reading the “_” xattr. Which is described in the issue:
> 
> http://tracker.ceph.com/issues/10117.
> 
> And I also find a two year’s old issue, which also describes the same bug:
> 
> http://tracker.ceph.com/issues/3676.
> 
> I think there is a apparent flaw in the related code. Could you help to review my last comment describing the way to fix the bug.
> 
> I prefer the second way, we just delete the object if we can’t get the “_” xattr, instead of crashing the osd, and the object has two other replicas, which can serve the client’s request.
> And when the next time self-healing process(scrub, deep scrub) occurs, the object can recover from its peer.
> 
> Because I am not so proficient of the source code, I don’t know if the repairing way has any other side effects on the ceph cluster.
> 
> If you have any idea about the bug, please feel free to let me know.
> 
> Thanks
> 
> Wenjunh
> 
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html