Hi Ceph Users, We have deployed a cloud infrastructure and we are using ceph (version 0.80.1) for the storage solution and opennebula(version 4.6.1) for the compute nodes. Ceph pool is configured to have a replication of 3. We have monitored one OSD to be down. We checked the VMs (running on Centos 5.10 final and Ubuntu14.04) and encountered certain kernel errors/messages. KERNEL ERRORS SEEN ON CENTOS5.10 FINAL hda: task_out_intr: status=0x50 { DriveReady SeekComplete } ide: failed opcode was: unknown hdc: dma_timer_expiry: dma status == 0x21 hda: irq timeout: status=0xd0 { Busy } ide: failed opcode was: unknown ide0: reset: success KERNEL ERRORS SEEN ON UBUNTU 14.04 [2955698.353338] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [2955698.356164] ata1.00: failed command: WRITE DMA [2955698.358428] ata1.00: cmd ca/00:08:58:a6:79/00:00:00:00:00/e0 tag 0 dma 4096 out [2955698.358428] res 40/00:02:00:08:00/00:00:00:00:00/b0 Emask 0x4 (timeout) [2955698.363070] ata1.00: status: { DRDY } [2955698.364598] ata1: soft resetting link [2955698.522853] ata1.00: configured for MWDMA2 [2955698.523840] ata1.01: configured for MWDMA2 [2955698.524447] ata1.00: device reported invalid CHS sector 0 [2955698.524476] ata1: EH complete [2956272.037421] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [2956272.037424] ata1.00: failed command: WRITE DMA [2956272.037429] ata1.00: cmd ca/00:08:58:a6:79/00:00:00:00:00/e0 tag 0 dma 4096 out [2956272.037429] res 40/00:02:00:08:00/00:00:00:00:00/b0 Emask 0x4 (timeout) [2956272.037430] ata1.00: status: { DRDY } [2956272.037543] ata1: soft resetting link [2956272.193802] ata1.00: configured for MWDMA2 [2956272.194259] ata1.01: configured for MWDMA2 [2956272.194546] ata1.00: device reported invalid CHS sector 0 [2956272.194560] ata1: EH complete We observed for a few hours and concluded the the OSD is flapping. We decided to remove the OSD out of the cluster. We checked the VMs again but these errors are still appearing. Any suggestions for our next steps? Regards, Pons Apollo Global Corp. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140814/5125de99/attachment.htm>