OSD suffers problems after filesystem crashed and recovered.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear experts,
Recently, a disk for one of our OSDs was failure and caused osd down, 
after I recovered the disk and filesystem, I noticed two problems:

1. journal corruption, which causes osd failure from starting:

     -2> 2014-05-28 22:21:19.592034 7f5c6ff437a0  1 journal _open 
/var/lib/ceph/osd/ceph-1/journal fd 20: 5367660544 bytes, block size
4096 bytes, directio = 1, aio = 1
     -1> 2014-05-28 22:21:19.606611 7f5c6ff437a0 -1 journal Unable to 
read past sequence 595649608 but header indicates the journal has
  committed up through 595649647, journal is corrupt
      0> 2014-05-28 22:21:19.608234 7f5c6ff437a0 -1 os/FileJournal.cc: 
In function 'bool FileJournal::read_entry(ceph::bufferlist&, uin
t64_t&, bool*)' thread 7f5c6ff437a0 time 2014-05-28 22:21:19.606625
os/FileJournal.cc: 1697: FAILED assert(0)


2. I guess I may use ceph-osd with "--mkjournal" option to fix journal 
corruption issue, but there is another thing that bothers me, which is, 
the previous osd daemon is staying in "D" state, so, it can't be 
terminated, but usually, when filesystem recovered, process should be 
able to leave D state, so, I am not sure what causes this and if I can 
ignore that without causing any bad consequence.


USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     22465 11.1  1.3 1516668 343624 ?      Dsl  Feb03 18441:31 
/usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c 
/etc/ceph/ceph.conf



BTW, look at lsof, process is sticking at several Input/Output errors, 
and osd client connections are staying in CLOSE_WAIT, e.g:


ceph-osd 22465 root  202u  unknown 
 
/current/13.66_head/DIR_6/DIR_E/DIR_A/DIR_7/rbd\udata.112bf2dbc003c.00000000000b385b__head_5A7B7AE6__d 
(stat: Input/output error)
ceph-osd 22465 root  203u  unknown 
 
/current/3.5a_head/DIR_A/DIR_D/DIR_8/rb.0.13a5.2ae8944a.000000140899__head_13C9C8DA__3 
(stat: Input/output error)
ceph-osd 22465 root  204u  unknown 
 
/current/3.4c_head/DIR_C/DIR_4/DIR_9/rb.0.13a5.2ae8944a.0000000809f1__head_3E44C94C__3 
(stat: Input/output error)
ceph-osd 22465 root  205u  unknown 
 
/current/13.52_head/DIR_2/DIR_D/DIR_1/DIR_7/rbd\udata.112bf2dbc003c.00000000000b3920__head_C41071D2__d 
(stat: Input/output error)
ceph-osd 22465 root  206u  unknown 
 
/current/13.5c_head/DIR_C/DIR_D/DIR_B/DIR_6/rbd\udata.112bf2dbc003c.00000000000b3922__head_EE946BDC__d 
(stat: Input/output error)
ceph-osd 22465 root  207u  unknown 
 
/current/13.5a_head/DIR_A/DIR_D/DIR_C/DIR_B/rbd\udata.112bf2dbc003c.00000000000b3934__head_031BBCDA__d 
(stat: Input/output error)
ceph-osd 22465 root  208u  unknown 
 
/current/13.27_head/DIR_7/DIR_A/DIR_3/DIR_2/rbd\udata.112bf2dbc003c.00000000000b3928__head_A2CF23A7__d 
(stat: Input/output error)
ceph-osd 22465 root  209u  unknown 
 
/current/13.6f_head/DIR_F/DIR_6/DIR_8/DIR_8/rbd\udata.112bf2dbc003c.00000000000b392a__head_71AA886F__d 
(stat: Input/output error)
ceph-osd 22465 root  210u  unknown 
 
/current/13.66_head/DIR_6/DIR_E/DIR_E/DIR_2/rbd\udata.112bf2dbc003c.00000000000b392c__head_30B22EE6__d 
(stat: Input/output error)
ceph-osd 22465 root  211u  unknown 
 
/current/13.69_head/DIR_9/DIR_E/DIR_9/DIR_C/rbd\udata.112bf2dbc003c.00000000000b3932__head_9C85C9E9__d 
(stat: Input/output error)
ceph-osd 22465 root  212u  unknown 
 
/current/13.51_head/DIR_1/DIR_5/DIR_7/DIR_F/rbd\udata.112bf2dbc003c.00000000000b3700__head_9BE9F751__d 
(stat: Input/output error)
ceph-osd 22465 root  213u  unknown 
 
/current/13.33_head/DIR_3/DIR_3/DIR_5/DIR_D/rbd\udata.112bf2dbc003c.00000000000b372e__head_1033D533__d 
(stat: Input/output error)
ceph-osd 22465 root  214u  unknown 
 
/current/13.2b_head/DIR_B/DIR_2/DIR_0/DIR_8/rbd\udata.117042a014b22.0000000000004c31__head_1E6A802B__d 
(stat: Input/output error)
ceph-osd 22465 root  215u  unknown 
 
/current/13.41_head/DIR_1/DIR_4/DIR_A/DIR_7/rbd\udata.3353793a09e6.0000000000194810__head_ADA57A41__d 
(stat: Input/output error)
ceph-osd 22465 root  216u  unknown 
 
/current/13.5b_head/DIR_B/DIR_5/DIR_A/DIR_D/rbd\udata.3353793a09e6.00000000001936c6__head_C01BDA5B__d 
(stat: Input/output error)
ceph-osd 22465 root  217u  unknown 
 
/current/13.4b_head/DIR_B/DIR_4/DIR_4/DIR_C/rbd\udata.3353793a09e6.0000000000193773__head_014DC44B__d 
(stat: Input/output error)


In any case, it would be very grateful if you experts could shed me some 
light.

Our current ceph version is ceph-0.72.2-0.el6.x86_64
And, the filesystem backend is xfs with fiber direct attached storages.



Thanks in advance
&
Best regards,
Felix Lee ~

-- 
Felix Lee                               Academia Sinica Grid & Cloud.
Tel: +886-2-27898308
Office: Room P111, Institute of Physics, 128 Academia Road, Section 2, 
Nankang, Taipei 115, Taiwan


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux