On 5/29/14 01:09 , Felix Lee wrote: > Dear experts, > Recently, a disk for one of our OSDs was failure and caused osd down, > after I recovered the disk and filesystem, I noticed two problems: > > 1. journal corruption, which causes osd failure from starting: > > > > 2. I guess I may use ceph-osd with "--mkjournal" option to fix journal > corruption issue, but there is another thing that bothers me, which > is, the previous osd daemon is staying in "D" state, so, it can't be > terminated, but usually, when filesystem recovered, process should be > able to leave D state, so, I am not sure what causes this and if I can > ignore that without causing any bad consequence. > > In any case, it would be very grateful if you experts could shed me > some light. > > Our current ceph version is ceph-0.72.2-0.el6.x86_64 > And, the filesystem backend is xfs with fiber direct attached storages. I can't speak to the specific errors you're seeing, but it looks like you have a failing or corrupted disk. Things I would investigate: 1. Is the disk itself failing? If this were a SATA disk, I'd check the SMART stats on the disk. I haven't dealt with Fiber Channel disks since before SMART was standardized, so I can't tell you do do that. 2. Get rid of the old ceph-osd process. Reboot the node if you have to. If things come up cleanly, then you're done. 3. Fsck the filesystem. If the FS is clean, then you probably corrupted the OSD journal. 4. How quickly do you need this fixed? At this point, I'm out of suggestions, so I'd remove the osd, zap it, and add it back in. If you can wait, somebody might have a better suggestion. Fiber Channel hardware is much more complicated that SATA and SAS. There are a lot more parts involved, which leaves more room for bugs. If you see this problem come back on the same disk, I'd replace the disk. If you see this happen again to other disks, I would get your Fiber Channel vendor involved. It wouldn't hurt to make sure you have the latest firmware on the disks, enclosure, and FC adapter. -- *Craig Lewis* Senior Systems Engineer Office +1.714.602.1309 Email clewis at centraldesktop.com <mailto:clewis at centraldesktop.com> *Central Desktop. Work together in ways you never thought possible.* Connect with us Website <http://www.centraldesktop.com/> | Twitter <http://www.twitter.com/centraldesktop> | Facebook <http://www.facebook.com/CentralDesktop> | LinkedIn <http://www.linkedin.com/groups?gid=147417> | Blog <http://cdblog.centraldesktop.com/> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140530/7a0fa986/attachment.htm>