OSDs not starting after journal drive replacement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, last night one of the PCI SSD drives that we use as a disk for OSD journal died, so we had to replace it, in this case for a 800GB SSD SATA Hard Disk. After recreating the journals 9 of the 11 OSDs of the server are not starting anymore (they start but after a minute, the OSD goes down).

Looking at the logs, I see that the service dies after a  *** Caught signal (Aborted) ** message.

extract from ceph-osd.6.log :

   -11> 2015-08-15 09:33:16.937820 7f32e6167700  5 -- op tracker -- seq: 484, time: 2015-08-15 09:33:16.937294, event: header_read, op: pg_info(1 pgs e10407:18.2c)
   -10> 2015-08-15 09:33:16.937822 7f32e6167700  5 -- op tracker -- seq: 484, time: 2015-08-15 09:33:16.937296, event: throttled, op: pg_info(1 pgs e10407:18.2c)
    -9> 2015-08-15 09:33:16.937826 7f32e6167700  5 -- op tracker -- seq: 484, time: 2015-08-15 09:33:16.937369, event: all_read, op: pg_info(1 pgs e10407:18.2c)
    -8> 2015-08-15 09:33:16.937830 7f32e6167700  5 -- op tracker -- seq: 484, time: 2015-08-15 09:33:16.937819, event: dispatched, op: pg_info(1 pgs e10407:18.2c)
    -7> 2015-08-15 09:33:16.937834 7f32e6167700  5 -- op tracker -- seq: 484, time: 2015-08-15 09:33:16.937834, event: waiting_for_osdmap, op: pg_info(1 pgs e10407:18.2c)
    -6> 2015-08-15 09:33:16.937837 7f32e6167700  5 -- op tracker -- seq: 484, time: 2015-08-15 09:33:16.937837, event: started, op: pg_info(1 pgs e10407:18.2c)
    -5> 2015-08-15 09:33:16.937848 7f32e6167700  5 -- op tracker -- seq: 484, time: 2015-08-15 09:33:16.937848, event: done, op: pg_info(1 pgs e10407:18.2c)
    -4> 2015-08-15 09:33:16.937860 7f32e6167700  1 -- 172.18.4.6:6800/21878 <== osd.11 172.18.4.7:6840/7934 44 ==== pg_info(1 pgs e10407:21.78) v4 ==== 759+0+0 (2132192505 0 0) 0x1da59fe0 con 0x15e9a520
    -3> 2015-08-15 09:33:16.937869 7f32e6167700  5 -- op tracker -- seq: 485, time: 2015-08-15 09:33:16.928344, event: header_read, op: pg_info(1 pgs e10407:21.78)
    -2> 2015-08-15 09:33:16.937871 7f32e6167700  5 -- op tracker -- seq: 485, time: 2015-08-15 09:33:16.928346, event: throttled, op: pg_info(1 pgs e10407:21.78)
    -1> 2015-08-15 09:33:16.937876 7f32e6167700  5 -- op tracker -- seq: 485, time: 2015-08-15 09:33:16.928388, event: all_read, op: pg_info(1 pgs e10407:21.78)
     0> 2015-08-15 09:33:16.937829 7f32de958700 -1 *** Caught signal (Aborted) **
 in thread 7f32de958700


I have the full log in case you need more information

Thank you for your help.

--
*Francisco J. Araya Maggiolo*
Devops Engineer & Cloud Specialist
KIO Networks
Mexico City
Phone: +52 (55) 8503 2600 ext. 3901
Mobile: +52 (1) (55) 6066 9025
http://www.kionetworks.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux