Hi I have found somthing. After restart time was wrong on server (+2hours) before ntp has fixed it. I restarted this 3 osd - it not helps. It is possible that ceph banned this osd? Or after start with wrong time osd has broken hi's filestore? -- Regards Dominik 2013/10/14 Dominik Mostowiec <dominikmostowiec@xxxxxxxxx>: > Hi, > I had server failure that starts from one disk failure: > Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.023986] sd 4:2:26:0: > [sdaa] Unhandled error code > Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.023990] sd 4:2:26:0: > [sdaa] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK > Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.023995] sd 4:2:26:0: > [sdaa] CDB: Read(10): 28 00 00 00 00 d0 00 00 10 00 > Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.024005] end_request: > I/O error, dev sdaa, sector 208 > Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.024744] XFS (sdaa): > metadata I/O error: block 0xd0 ("xfs_trans_read_buf") error 5 buf > count 8192 > Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.025879] XFS (sdaa): > xfs_imap_to_bp: xfs_trans_read_buf() returned error 5. > Oct 14 03:25:28 s3-10-177-64-6 kernel: [1027260.820288] XFS (sdaa): > metadata I/O error: block 0xd0 ("xfs_trans_read_buf") error 5 buf > count 8192 > Oct 14 03:25:28 s3-10-177-64-6 kernel: [1027260.821194] XFS (sdaa): > xfs_imap_to_bp: xfs_trans_read_buf() returned error 5. > Oct 14 03:25:32 s3-10-177-64-6 kernel: [1027264.667851] XFS (sdaa): > metadata I/O error: block 0xd0 ("xfs_trans_read_buf") error 5 buf > count 8192 > > this caused that the server has been unresponsive. > > After server restart 3 of 26 osd on it are down. > In ceph-osd log after "debug osd = 10" and restart is: > > 2013-10-14 06:21:23.141936 7fdeb4872700 -1 osd.47 43203 *** Got signal > Terminated *** > 2013-10-14 06:21:23.142141 7fdeb4872700 -1 osd.47 43203 pausing thread pools > 2013-10-14 06:21:23.142146 7fdeb4872700 -1 osd.47 43203 flushing io > 2013-10-14 06:21:25.406187 7f02690f9780 0 > filestore(/vol0/data/osd.47) mount FIEMAP ioctl is supported and > appears to work > 2013-10-14 06:21:25.406204 7f02690f9780 0 > filestore(/vol0/data/osd.47) mount FIEMAP ioctl is disabled via > 'filestore fiemap' config option > 2013-10-14 06:21:25.406557 7f02690f9780 0 > filestore(/vol0/data/osd.47) mount did NOT detect btrfs > 2013-10-14 06:21:25.412617 7f02690f9780 0 > filestore(/vol0/data/osd.47) mount syncfs(2) syscall fully supported > (by glibc and kernel) > 2013-10-14 06:21:25.412831 7f02690f9780 0 > filestore(/vol0/data/osd.47) mount found snaps <> > 2013-10-14 06:21:25.415798 7f02690f9780 0 > filestore(/vol0/data/osd.47) mount: enabling WRITEAHEAD journal mode: > btrfs not detected > 2013-10-14 06:21:26.078377 7f02690f9780 2 osd.47 0 mounting > /vol0/data/osd.47 /vol0/data/osd.47/journal > 2013-10-14 06:21:26.080872 7f02690f9780 0 > filestore(/vol0/data/osd.47) mount FIEMAP ioctl is supported and > appears to work > 2013-10-14 06:21:26.080885 7f02690f9780 0 > filestore(/vol0/data/osd.47) mount FIEMAP ioctl is disabled via > 'filestore fiemap' config option > 2013-10-14 06:21:26.081289 7f02690f9780 0 > filestore(/vol0/data/osd.47) mount did NOT detect btrfs > 2013-10-14 06:21:26.087524 7f02690f9780 0 > filestore(/vol0/data/osd.47) mount syncfs(2) syscall fully supported > (by glibc and kernel) > 2013-10-14 06:21:26.087582 7f02690f9780 0 > filestore(/vol0/data/osd.47) mount found snaps <> > 2013-10-14 06:21:26.089614 7f02690f9780 0 > filestore(/vol0/data/osd.47) mount: enabling WRITEAHEAD journal mode: > btrfs not detected > 2013-10-14 06:21:26.726676 7f02690f9780 2 osd.47 0 boot > 2013-10-14 06:21:26.726773 7f02690f9780 10 osd.47 0 read_superblock > sb(16773c25-5054-4451-bf9f-efc1f7f21b89 osd.47 > 63cf7d70-99cb-0ab1-4006-00000000002f e43203 [41261,43203] > lci=[43194,43203]) > 2013-10-14 06:21:26.726862 7f02690f9780 10 osd.47 0 add_map_bl 43203 82622 bytes > 2013-10-14 06:21:26.727184 7f02690f9780 10 osd.47 43203 load_pgs > 2013-10-14 06:21:26.727643 7f02690f9780 10 osd.47 43203 load_pgs > ignoring unrecognized meta > 2013-10-14 06:21:26.727681 7f02690f9780 10 osd.47 43203 load_pgs > 3.df1_TEMP clearing temp > > osd.47 is still down, I put it out from cluster. > 47 1 osd.47 down 0 > > How can I check what is wrong? > > ceph -v > ceph version 0.56.6 (95a0bda7f007a33b0dc7adf4b330778fa1e5d70c) > > -- > Pozdrawiam > Dominik -- Pozdrawiam Dominik _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com