Hi, last night one of the PCI SSD drives that we use as a
disk for OSD journal died, so we had to replace it, in this case for a
800GB SSD SATA Hard Disk. After recreating the journals 9 of the 11 OSDs
of the server are not starting anymore (they start but after a minute,
the OSD goes down).
Looking at the logs, I see that the service dies after a *** Caught signal (Aborted) ** message.extract from ceph-osd.6.log :
-11> 2015-08-15 09:33:16.937820 7f32e6167700 5 -- op tracker -- seq: 484, time: 2015-08-15 09:33:16.937294, event: header_read, op: pg_info(1 pgs e10407:18.2c)
-10> 2015-08-15 09:33:16.937822 7f32e6167700 5 -- op tracker -- seq: 484, time: 2015-08-15 09:33:16.937296, event: throttled, op: pg_info(1 pgs e10407:18.2c)
-9> 2015-08-15 09:33:16.937826 7f32e6167700 5 -- op tracker -- seq: 484, time: 2015-08-15 09:33:16.937369, event: all_read, op: pg_info(1 pgs e10407:18.2c)
-8> 2015-08-15 09:33:16.937830 7f32e6167700 5 -- op tracker -- seq: 484, time: 2015-08-15 09:33:16.937819, event: dispatched, op: pg_info(1 pgs e10407:18.2c)
-7> 2015-08-15 09:33:16.937834 7f32e6167700 5 -- op tracker -- seq: 484, time: 2015-08-15 09:33:16.937834, event: waiting_for_osdmap, op: pg_info(1 pgs e10407:18.2c)
-6> 2015-08-15 09:33:16.937837 7f32e6167700 5 -- op tracker -- seq: 484, time: 2015-08-15 09:33:16.937837, event: started, op: pg_info(1 pgs e10407:18.2c)
-5> 2015-08-15 09:33:16.937848 7f32e6167700 5 -- op tracker -- seq: 484, time: 2015-08-15 09:33:16.937848, event: done, op: pg_info(1 pgs e10407:18.2c)
-4> 2015-08-15 09:33:16.937860 7f32e6167700 1 -- 172.18.4.6:6800/21878 <== osd.11 172.18.4.7:6840/7934 44 ==== pg_info(1 pgs e10407:21.78) v4 ==== 759+0+0 (2132192505 0 0) 0x1da59fe0 con 0x15e9a520
-3> 2015-08-15 09:33:16.937869 7f32e6167700 5 -- op tracker -- seq: 485, time: 2015-08-15 09:33:16.928344, event: header_read, op: pg_info(1 pgs e10407:21.78)
-2> 2015-08-15 09:33:16.937871 7f32e6167700 5 -- op tracker -- seq: 485, time: 2015-08-15 09:33:16.928346, event: throttled, op: pg_info(1 pgs e10407:21.78)
-1> 2015-08-15 09:33:16.937876 7f32e6167700 5 -- op tracker -- seq: 485, time: 2015-08-15 09:33:16.928388, event: all_read, op: pg_info(1 pgs e10407:21.78)
0> 2015-08-15 09:33:16.937829 7f32de958700 -1 *** Caught signal (Aborted) **
in thread 7f32de958700
--
*Francisco J. Araya Maggiolo*
Devops Engineer & Cloud Specialist
KIO Networks
Mexico City
Phone: +52 (55) 8503 2600 ext. 3901
Mobile: +52 (1) (55) 6066 9025
http://www.kionetworks.com
Devops Engineer & Cloud Specialist
KIO Networks
Mexico City
Phone: +52 (55) 8503 2600 ext. 3901
Mobile: +52 (1) (55) 6066 9025
http://www.kionetworks.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com