On 02/19/2014 02:22 PM, Thorvald Hallvardsson wrote:
Eventually after 1 hour it spotted that. I took the disk out at 11:06:02 so literally 1 hour later: 6 0.9 osd.6 down 0 7 0.9 osd.7 up 1 8 0.9 osd.8 up 1 2014-02-19 12:06:02.802388 mon.0 [INF] osd.6 172.17.12.15:6800/1569 <http://172.17.12.15:6800/1569> failed (3 reports from 3 peers after 22.338687 >= grace 20.000000) but 1 hour is a bit ... too long isn't it ?
The OSD will commit suicide if it encounters to much I/O errors, but it's not clear what exactly happened in this case.
I suggest you take a look at the logs of osd.6 to see why it stopped working.
Wido
On 19 February 2014 11:31, Thorvald Hallvardsson <thorvald.hallvardsson@xxxxxxxxx <mailto:thorvald.hallvardsson@xxxxxxxxx>> wrote: Hi guys, Quick question. I have a VM with some SCSI drives which act as the OSDs in my test lab. I have removed the SCSI drive so it's totally gone from the system, syslog is dropping I/O errors but the cluster still looks healthy. Can you tell me why ? I'm trying to reproduce the problem if the real drive would have failed. # ll /dev/sd* brw-rw---- 1 root disk 8, 0 Feb 19 11:13 /dev/sda brw-rw---- 1 root disk 8, 1 Feb 17 16:45 /dev/sda1 brw-rw---- 1 root disk 8, 2 Feb 17 16:45 /dev/sda2 brw-rw---- 1 root disk 8, 5 Feb 17 16:45 /dev/sda5 brw-rw---- 1 root disk 8, 32 Feb 19 11:13 /dev/sdc brw-rw---- 1 root disk 8, 33 Feb 17 16:45 /dev/sdc1 brw-rw---- 1 root disk 8, 34 Feb 19 11:11 /dev/sdc2 brw-rw---- 1 root disk 8, 48 Feb 19 11:13 /dev/sdd brw-rw---- 1 root disk 8, 49 Feb 17 16:45 /dev/sdd1 brw-rw---- 1 root disk 8, 50 Feb 19 11:05 /dev/sdd2 Feb 19 11:06:02 ceph-test-vosd-03 kernel: [586497.813485] sd 2:0:1:0: [sdb] Synchronizing SCSI cache Feb 19 11:06:13 ceph-test-vosd-03 kernel: [586508.197668] XFS (sdb1): metadata I/O error: block 0x39e116d3 ("xlog_iodone") error 19 numblks 64 Feb 19 11:06:13 ceph-test-vosd-03 kernel: [586508.197815] XFS (sdb1): xfs_do_force_shutdown(0x2) called from line 1115 of file /build/buildd/linux-lts-saucy-3.11.0/fs/xfs/xfs_log.c. Return address = 0xffffffffa01e1fe1 Feb 19 11:06:13 ceph-test-vosd-03 kernel: [586508.197823] XFS (sdb1): Log I/O Error Detected. Shutting down filesystem Feb 19 11:06:13 ceph-test-vosd-03 kernel: [586508.197880] XFS (sdb1): Please umount the filesystem and rectify the problem(s) Feb 19 11:06:43 ceph-test-vosd-03 kernel: [586538.306817] XFS (sdb1): xfs_log_force: error 5 returned. Feb 19 11:07:13 ceph-test-vosd-03 kernel: [586568.415986] XFS (sdb1): xfs_log_force: error 5 returned. Feb 19 11:07:43 ceph-test-vosd-03 kernel: [586598.525178] XFS (sdb1): xfs_log_force: error 5 returned. Feb 19 11:08:13 ceph-test-vosd-03 kernel: [586628.634356] XFS (sdb1): xfs_log_force: error 5 returned. Feb 19 11:08:43 ceph-test-vosd-03 kernel: [586658.743533] XFS (sdb1): xfs_log_force: error 5 returned. Feb 19 11:09:13 ceph-test-vosd-03 kernel: [586688.852714] XFS (sdb1): xfs_log_force: error 5 returned. Feb 19 11:09:43 ceph-test-vosd-03 kernel: [586718.961903] XFS (sdb1): xfs_log_force: error 5 returned. Feb 19 11:10:13 ceph-test-vosd-03 kernel: [586749.071076] XFS (sdb1): xfs_log_force: error 5 returned. Feb 19 11:10:43 ceph-test-vosd-03 kernel: [586779.180263] XFS (sdb1): xfs_log_force: error 5 returned. Feb 19 11:11:13 ceph-test-vosd-03 kernel: [586809.289440] XFS (sdb1): xfs_log_force: error 5 returned. Feb 19 11:11:44 ceph-test-vosd-03 kernel: [586839.398626] XFS (sdb1): xfs_log_force: error 5 returned. Feb 19 11:12:14 ceph-test-vosd-03 kernel: [586869.507804] XFS (sdb1): xfs_log_force: error 5 returned. Feb 19 11:12:44 ceph-test-vosd-03 kernel: [586899.616988] XFS (sdb1): xfs_log_force: error 5 returned. Feb 19 11:12:52 ceph-test-vosd-03 kernel: [586907.848993] end_request: I/O error, dev fd0, sector 0 mount: /dev/sdb1 on /var/lib/ceph/osd/ceph-6 type xfs (rw,noatime) /dev/sdc1 on /var/lib/ceph/osd/ceph-7 type xfs (rw,noatime) /dev/sdd1 on /var/lib/ceph/osd/ceph-8 type xfs (rw,noatime) ll /var/lib/ceph/osd/ceph-6 ls: cannot access /var/lib/ceph/osd/ceph-6: Input/output error -4 2.7 host ceph-test-vosd-03 6 0.9 osd.6 up 1 7 0.9 osd.7 up 1 8 0.9 osd.8 up 1 # ceph-disk list /dev/fd0 other, unknown /dev/sda : /dev/sda1 other, ext2 /dev/sda2 other /dev/sda5 other, LVM2_member /dev/sdc : /dev/sdc1 ceph data, active, cluster ceph, osd.7, journal /dev/sdc2 /dev/sdc2 ceph journal, for /dev/sdc1 /dev/sdd : /dev/sdd1 ceph data, active, cluster ceph, osd.8, journal /dev/sdd2 /dev/sdd2 ceph journal, for /dev/sdd1 cluster 1a588c94-6f5e-4b04-bc07-f5ce99b91a35 health HEALTH_OK monmap e7: 3 mons at {ceph-test-mon-01=172.17.12.11:6789/0,ceph-test-mon-02=172.17.12.12:6789/0,ceph-test-mon-03=172.17.12.13:6789/0 <http://172.17.12.11:6789/0,ceph-test-mon-02=172.17.12.12:6789/0,ceph-test-mon-03=172.17.12.13:6789/0>}, election epoch 50, quorum 0,1,2 ceph-test-mon-01,ceph-test-mon-02,ceph-test-mon-03 mdsmap e4: 1/1/1 up {0=ceph-test-admin=up:active} osdmap e124: 9 osds: 9 up, 9 in pgmap v1812: 256 pgs, 13 pools, 1522 MB data, 469 objects 3379 MB used, 8326 GB / 8329 GB avail 256 active+clean So as you can see osd.6 is missing but the cluster is happy. Thank you. Regards. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com