osd down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hello,all
today,we encountered a strange problem.
we write data into OSD cluster, in the beginning,it works well.
but few hours later, the client can not write anymore data.
"ceph -s" shows OSD down. and we can not even ssh into that OSD, the
keyboard can not work, screen got drak.
after reboot it manually, from the kern.log of the down OSD in below
…………
Jul 28 11:25:40 T02-OSD152 kernel: [ 4393.176941] r8169 0000:01:00.0:
eth0: link up
Jul 28 11:26:00 T02-OSD152 kernel: [ 4413.166737] r8169 0000:01:00.0:
eth0: link up
Jul 28 11:26:00 T02-OSD152 kernel: [ 4413.426215] r8169 0000:01:00.0:
eth0: link up
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@6802/3615
took stat stat(2011-07-23 13:45:26.302800 oprate=0 qlen=0
recent_qlen=0 rdlat=0 / 0 fshedin=0)
2011-07-23 13:45:26.171417 7f754f9bc700 osd0 7 take_peer_stat peer
osd1 stat(2011-07-23 13:45:26.302800 oprate=0 qlen=0 recent_qlen=0
rdlat=0 / 0 fshedin=0)
2011-07-23 13:45:26.171423 7f754f9bc700 osd0 7 _share_map_outgoing
osd1 192.168.0.155:6801/3615 already has epoch 7
2011-07-23 13:45:26.988122 7f754f9bc700 -- 192.168.0.152:6802/3446 <==
osd2 192.168.0.156:6802/3527 15264 ==== osd_ping(e7 as_of 7 heartbeat)
v1 ==== 61+0+0 (2575005658 0 0) 0x2514000 con 0x186bc80
2011-07-23 13:45:26.988166 7f754f9bc700 osd0 7 handle_osd_ping osd2
192.168.0.156:6802/3527 took stat stat(2011-07-23 13:45:27.115844
oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
2011-07-23 13:45:26.988180 7f754f9bc700 osd0 7 take_peer_stat peer
osd2 stat(2011-07-23 13:45:27.115844 oprate=0 qlen=0 recent_qlen=0
rdlat=0 / 0 fshedin=0)
2011-07-23 13:45:26.988187 7f754f9bc700 osd0 7 _share_map_outgoing
osd2 192.168.0.156:6801/3527 already has epoch 7
2011-07-23 13:45:27.009638 7f75559c8700 osd0 7 tick
2011-07-23 13:45:27.009695 7f75559c8700 osd0 7 scrub_should_schedule
loadavg 0.03 < max 0.5 = no, randomly backing off
2011-07-23 13:45:27.094897 7f75541c5700 filestore(/data/osd0)
sync_entry woke after 1.000086
2011-07-23 13:45:27.094920 7f75541c5700 journal commit_start op_seq
3393, applied_seq 3393, committed_seq 3393
2011-07-23 13:45:27.094935 7f75541c5700 journal commit_start nothing to do
2011-07-23 13:45:27.094951 7f75541c5700 filestore(/data/osd0)
sync_entry waiting for max_interval 1.000000
2011-07-23 13:45:27.171283 7f754f9bc700 -- 192.168.0.152:6802/3446 <==
osd1 192.168.0.155:6802/3615 15248 ==== osd_ping(e7 as_of 7 heartbeat)
v1 ==== 61+0+0 (1684980217 0 0) 0x254c000 con 0x186b140
2011-07-23 13:45:27.171329 7f754f9bc700 osd0 7 handle_osd_ping osd1
192.168.0.155:6802/3615 took stat stat(2011-07-23 13:45:27.303017
oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
2011-07-23 13:45:27.171343 7f754f9bc700 osd0 7 take_peer_stat peer
osd1 stat(2011-07-23 13:45:27.303017 oprate=0 qlen=0 recent_qlen=0
rdlat=0 / 0 fshedin=0)
2011-07-23 13:45:27.171350 7f754f9bc700 osd0 7 _share_map_outgoing
osd1 192.1Jul 28 14:50:56 T02-OSD152 kernel: imklog 3.18.6, log source
= /proc/kmsg started.
Jul 28 14:50:56 T02-OSD152 kernel: [    0.000000] Initializing cgroup
subsys cpuset
Jul 28 14:50:56 T02-OSD152 kernel: [    0.000000] Initializing cgroup subsys cpu
Jul 28 14:50:56 T02-OSD152 kernel: [    0.000000] Linux version
2.6.37.6 (root@T02-OSD151) (gcc version 4.3.2 (Debian 4.3.2-1.1) ) #1
SMP Mon Jul 18 10:23:56 CST 2011
Jul 28 14:50:56 T02-OSD152 kernel: [    0.000000] Command line:
root=/dev/sda2 quiet vga=788 splash ro
Jul 28 14:50:56 T02-OSD152 kernel: [    0.000000] BIOS-provided
physical RAM map:
Jul 28 14:50:56 T02-OSD152 kernel: [    0.000000]  BIOS-e820:
0000000000000000 - 000000000009dc00 (usable)
Jul 28 14:50:56 T02-OSD152 kernel: [    0.000000]  BIOS-e820:
000000000009dc00 - 00000000000a0000 (reserved)
Jul 28 14:50:56 T02-OSD152 kernel: [    0.000000]  BIOS-e820:
00000000000e4000 - 0000000000100000 (reserved)
Jul 28 14:50:56 T02-OSD152 kernel: [    0.000000]  BIOS-e820:
0000000000100000 - 00000000dcf70000 (usable)
Jul 28 14:50:56 T02-OSD152 kernel: [    0.000000]  BIOS-e820:
00000000dcf70000 - 00000000dcf88000 (ACPI data)
Jul 28 14:50:56 T02-OSD152 kernel: [    0.000000]  BIOS-e820:
00000000dcf88000 - 00000000dcfdc000 (ACPI NVS)
Jul 28 14:50:56 T02-OSD152 kernel: [    0.000000]  BIOS-e820:
00000000dcfdc000 - 00000000dd800000 (reserved)
Jul 28 14:50:56 T02-OSD152 kernel: [    0.000000]  BIOS-e820:
00000000dde00000 - 00000000e0000000 (reserved)
Jul 28 14:50:56 T02-OSD152 kernel: [    0.000000]  BIOS-e820:
00000000fee00000 - 00000000fee01000 (reserved)
Jul 28 14:50:56 T02-OSD152 kernel: [    0.000000]  BIOS-e820:
00000000ff800000 - 0000000100000000 (reserved)
Jul 28 14:50:56 T02-OSD152 kernel: [    0.000000]  BIOS-e820:
0000000100000000 - 0000000118000000 (usable)
…………
we are confused by this message, how did OSD log infos goes into
kern.log. By the way, we search " kernel: imklog 3.18.6, log source =
/proc/kmsg started." in google, it said something about syslog deamon.
We run rsyslogd deamon in all OSD to backup OSD debug log. I'm not
sure wherther this resulted in the OSD down.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux