Re: Random OSDs respawning continuously

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It's not entirely clear, but it looks like all the ops are just your caching pool OSDs trying to promote objects, and your backing pool OSD's aren't fast enough to satisfy all the IO demanded of them. You may be overloading the system.
-Greg
On Fri, Feb 13, 2015 at 6:06 AM Mohamed Pakkeer <mdfakkeer@xxxxxxxxx> wrote:
Hi all,

  When i stop the respawning osd on an OSD node, another osd is respawning  on the same node. when the OSD is started to respawing, it puts the following info in the osd log. 

slow request 31.129671 seconds old, received at 2015-02-13 19:09:32.180496: osd_op(osd.551.95229:11 191 100000005c4.00000033 [copy-get max 8388608] 13.f4ccd256 RETRY=50 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg

OSD.551 is part of cache tier. All the respawning osds have the log with different cache tier OSDs. If i restart all the osds in the cache tier osd node, respawning is stopped  and cluster become active + clean state. But when i try to write some data on the cluster, random osd starts the respawning. 

can anyone help me how to solve this issue?


  2015-02-13 19:10:02.309848 7f53eef54700  0 log_channel(default) log [WRN] : 11 slow requests, 11 included below; oldest blocked for > 30.132629 secs
2015-02-13 19:10:02.309854 7f53eef54700  0 log_channel(default) log [WRN] : slow request 30.132629 seconds old, received at 2015-02-13 19:09:32.177075: osd_op(osd.551.95229:63
 100000002ae.00000000 [copy-from ver 7622] 13.7273b256 RETRY=130 snapc 1=[] ondisk+retry+write+ignore_overlay+enforce_snapc+known_if_redirected e95518) currently reached_pg
2015-02-13 19:10:02.309858 7f53eef54700  0 log_channel(default) log [WRN] : slow request 30.131608 seconds old, received at 2015-02-13 19:09:32.178096: osd_op(osd.551.95229:41
5 100000003a0.00000006 [copy-get max 8388608] 13.aefb256 RETRY=118 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg
2015-02-13 19:10:02.309861 7f53eef54700  0 log_channel(default) log [WRN] : slow request 30.130994 seconds old, received at 2015-02-13 19:09:32.178710: osd_op(osd.551.95229:26
83 1000000029d.0000003b [copy-get max 8388608] 13.a2be1256 RETRY=115 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg
2015-02-13 19:10:02.309864 7f53eef54700  0 log_channel(default) log [WRN] : slow request 30.130426 seconds old, received at 2015-02-13 19:09:32.179278: osd_op(osd.551.95229:39
39 100000004e9.00000032 [copy-get max 8388608] 13.6a25b256 RETRY=105 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg
2015-02-13 19:10:02.309868 7f53eef54700  0 log_channel(default) log [WRN] : slow request 30.129697 seconds old, received at 2015-02-13 19:09:32.180007: osd_op(osd.551.95229:97
49 10000000553.0000007e [copy-get max 8388608] 13.c8645256 RETRY=59 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg
2015-02-13 19:10:03.310284 7f53eef54700  0 log_channel(default) log [WRN] : 11 slow requests, 6 included below; oldest blocked for > 31.133092 secs
2015-02-13 19:10:03.310305 7f53eef54700  0 log_channel(default) log [WRN] : slow request 31.129671 seconds old, received at 2015-02-13 19:09:32.180496: osd_op(osd.551.95229:11
191 100000005c4.00000033 [copy-get max 8388608] 13.f4ccd256 RETRY=50 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg
2015-02-13 19:10:03.310308 7f53eef54700  0 log_channel(default) log [WRN] : slow request 31.128616 seconds old, received at 2015-02-13 19:09:32.181551: osd_op(osd.551.95229:12
903 100000002e4.000000d6 [copy-get max 8388608] 13.f56a3256 RETRY=41 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg
2015-02-13 19:10:03.310322 7f53eef54700  0 log_channel(default) log [WRN] : slow request 31.127807 seconds old, received at 2015-02-13 19:09:32.182360: osd_op(osd.551.95229:14
165 10000000480.00000110 [copy-get max 8388608] 13.fd8c1256 RETRY=32 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg
2015-02-13 19:10:03.310327 7f53eef54700  0 log_channel(default) log [WRN] : slow request 31.127320 seconds old, received at 2015-02-13 19:09:32.182847: osd_op(osd.551.95229:15
013 1000000047f.00000133 [copy-get max 8388608] 13.b7b05256 RETRY=27 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg
2015-02-13 19:10:03.310331 7f53eef54700  0 log_channel(default) log [WRN] : slow request 31.126935 seconds old, received at 2015-02-13 19:09:32.183232: osd_op(osd.551.95229:15
767 1000000066d.0000001e [copy-get max 8388608] 13.3b017256 RETRY=25 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg
2015-02-13 19:10:04.310685 7f53eef54700  0 log_channel(default) log [WRN] : 11 slow requests, 1 included below; oldest blocked for > 32.133566 secs
2015-02-13 19:10:04.310705 7f53eef54700  0 log_channel(default) log [WRN] : slow request 32.126584 seconds old, received at 2015-02-13 19:09:32.184057: osd_op(osd.551.95229:16
293 10000000601.00000029 [copy-get max 8388608] 13.293e1256 RETRY=25 ack+retry+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e95518) currently reached_pg




2015-02-13 19:10:05.967407 7f4411770900  0 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578), process ceph-osd, pid 2071712
2015-02-13 19:10:05.971917 7f4411770900  0 filestore(/var/lib/ceph/osd/ceph-403) backend xfs (magic 0x58465342)
2015-02-13 19:10:05.971936 7f4411770900  1 filestore(/var/lib/ceph/osd/ceph-403)  disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs
2015-02-13 19:10:06.009745 7f4411770900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-403) detect_features: FIEMAP ioctl is supported and appears to work
2015-02-13 19:10:06.009786 7f4411770900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-403) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-02-13 19:10:06.026282 7f4411770900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-403) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2015-02-13 19:10:06.026421 7f4411770900  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-403) detect_feature: extsize is disabled by conf
2015-02-13 19:10:06.178991 7f4411770900  0 filestore(/var/lib/ceph/osd/ceph-403) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2015-02-13 19:10:06.186378 7f4411770900  1 journal _open /var/lib/ceph/osd/ceph-403/journal fd 21: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-13 19:10:06.248640 7f4411770900  1 journal _open /var/lib/ceph/osd/ceph-403/journal fd 21: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-13 19:10:06.377309 7f4411770900  1 journal close /var/lib/ceph/osd/ceph-403/journal
2015-02-13 19:10:06.449653 7f4411770900  0 filestore(/var/lib/ceph/osd/ceph-403) backend xfs (magic 0x58465342)
2015-02-13 19:10:06.510328 7f4411770900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-403) detect_features: FIEMAP ioctl is supported and appears to work
2015-02-13 19:10:06.510362 7f4411770900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-403) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-02-13 19:10:06.560259 7f4411770900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-403) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2015-02-13 19:10:06.560353 7f4411770900  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-403) detect_feature: extsize is disabled by conf
2015-02-13 19:10:06.653577 7f4411770900  0 filestore(/var/lib/ceph/osd/ceph-403) mount: WRITEAHEAD journal mode explicitly enabled in conf
2015-02-13 19:10:06.659761 7f4411770900  1 journal _open /var/lib/ceph/osd/ceph-403/journal fd 21: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-13 19:10:06.706124 7f4411770900  1 journal _open /var/lib/ceph/osd/ceph-403/journal fd 21: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-13 19:10:06.707848 7f4411770900  0 <cls> cls/hello/cls_hello.cc:271: loading cls_hello
2015-02-13 19:10:06.718958 7f4411770900  0 osd.403 95523 crush map has features 104186773504, adjusting msgr requires for clients
2015-02-13 19:10:06.718994 7f4411770900  0 osd.403 95523 crush map has features 379064680448 was 8705, adjusting msgr requires for mons
2015-02-13 19:10:06.719003 7f4411770900  0 osd.403 95523 crush map has features 379064680448, adjusting msgr requires for osds
2015-02-13 19:10:06.719047 7f4411770900  0 osd.403 95523 load_pgs
2015-02-13 19:10:07.289273 7f4411770900  0 osd.403 95523 load_pgs opened 187 pgs
2015-02-13 19:10:07.290528 7f4411770900 -1 osd.403 95523 set_disk_tp_priority(22) Invalid argument: osd_disk_thread_ioprio_class is  but only the following values are allowed:
 idle, be or rt
2015-02-13 19:10:07.299139 7f43fe0d1700  0 osd.403 95523 ignoring osdmap until we have initialized
2015-02-13 19:10:07.299273 7f43fe0d1700  0 osd.403 95523 ignoring osdmap until we have initialized
2015-02-13 19:10:07.367439 7f4411770900  0 osd.403 95523 done with init, starting boot process
2015-02-13 19:10:09.628008 7f43c2b3d700  0 -- 10.1.100.14:6836/2071712 >> 10.1.100.5:6938/15118444 pipe(0xc4d59c0 sd=459 :6836 s=0 pgs=0 cs=0 l=0 c=0xbd78c60).accept connect_s
eq 0 vs existing 0 state wait
2015-02-13 19:10:09.633725 7f43c3c4e700  0 -- 10.1.100.14:6836/2071712 >> 10.1.100.13:6810/9067610 pipe(0xcf7cb00 sd=436 :6836 s=0 pgs=0 cs=0 l=0 c=0xd006ec0).accept connect_s
eq 0 vs existing 0 state connecting
2015-02-13 19:10:09.670055 7f43b7f92700  0 -- 10.1.100.14:6836/2071712 >> 10.1.100.5:6802/10118805 pipe(0xd23b9c0 sd=539 :6836 s=0 pgs=0 cs=0 l=0 c=0xd1a5b20).accept connect_s
eq 0 vs existing 0 state wait
2015-02-13 19:10:09.675371 7f43ba2b5700  0 -- 10.1.100.14:6836/2071712 >> 10.1.100.8:6930/8115813 pipe(0xd6c7440 sd=522 :6836 s=0 pgs=0 cs=0 l=0 c=0xd16f180).accept connect_se
q 0 vs existing 0 state connecting
2015-02-13 19:10:09.679692 7f43b7487700  0 -- 10.1.100.14:6836/2071712 >> 10.1.100.4:6886/11127316 pipe(0xd23a3c0 sd=546 :6836 s=0 pgs=0 cs=0 l=0 c=0xd1a5440).accept connect_s
eq 0 vs existing 0 state wait
2015-02-13 19:10:09.708472 7f43b3e51700  0 -- 10.1.100.14:6836/2071712 >> 10.1.100.13:6879/12175877 pipe(0xd9da100 sd=570 :6836 s=0 pgs=0 cs=0 l=0 c=0xda589a0).accept connect_
seq 0 vs existing 0 state wait
2015-02-13 19:10:09.717141 7f43b0f22700  0 -- 10.1.100.14:6836/2071712 >> 10.1.100.7:6819/11132701 pipe(0xd8a5180 sd=596 :6836 s=0 pgs=0 cs=0 l=0 c=0xe251080).accept connect_s
eq 0 vs existing 0 state connecting
2015-02-13 19:10:09.721672 7f43aff12700  0 -- 10.1.100.14:6836/2071712 >> 10.1.100.6:6804/19191298 pipe(0xd8a3340 sd=603 :6836 s=0 pgs=0 cs=0 l=0 c=0xe250b00).accept connect_s
eq 0 vs existing 0 state connecting
2015-02-13 19:10:09.730813 7f43b1326700  0 -- 10.1.100.14:6836/2071712 >> 10.1.100.7:6825/1301227 pipe(0xe0c1c80 sd=593 :6836 s=0 pgs=0 cs=0 l=0 c=0xe2514a0).accept connect_se
q 0 vs existing 0 state connecting
2015-02-13 19:10:09.879344 7f43ad8ec700  0 -- 10.1.100.14:6836/2071712 >> 10.1.100.8:6845/15123594 pipe(0xe0bfb80 sd=621 :6836 s=0 pgs=0 cs=0 l=0 c=0xe250160).accept connect_s
eq 0 vs existing 0 state wait
2015-02-13 19:10:09.888010 7f43ab8cc700  0 -- 10.1.100.14:6836/2071712 >> 10.1.100.2:6832/9203280 pipe(0xce6f8c0 sd=648 :6836 s=0 pgs=0 cs=0 l=0 c=0xe85c100).accept connect_se
q 0 vs existing 0 state wait
2015-02-13 19:10:09.897543 7f43a4559700  0 -- 10.1.100.14:6836/2071712 >> 10.1.100.6:6916/10181510 pipe(0xd975b80 sd=699 :6836 s=0 pgs=0 cs=0 l=0 c=0xe913c80).accept connect_s
eq 0 vs existing 0 state wait
2015-02-13 19:10:09.901181 7f43a1c30700  0 -- 10.1.100.14:6836/2071712 >> 10.1.100.6:6872/17198411 pipe(0xe9d4ec0 sd=715 :6836 s=0 pgs=0 cs=0 l=0 c=0xed53340).accept connect_s
eq 0 vs existing 0 state connecting
2015-02-13 19:10:09.904586 7f43a1a2e700  0 -- 10.1.100.14:6836/2071712 >> 10.1.100.1:6816/14116404 pipe(0xe9d4940 sd=717 :6836 s=0 pgs=0 cs=0 l=0 c=0xed53080).accept connect_s
eq 0 vs existing 0 state connecting
2015-02-13 19:10:09.910772 7f43a071b700  0 -- 10.1.100.14:6836/2071712 >> :/0 pipe(0xe9d4680 sd=721 :6836 s=0 pgs=0 cs=0 l=0 c=0xed52f20).accept failed to getpeername (107) Tr
ansport endpoint is not connected
2015-02-13 19:10:09.959742 7f439fd11700  0 -- 10.1.100.14:6836/2071712 >> 10.1.100.1:6835/17116573 pipe(0xe9d43c0 sd=727 :6836 s=0 pgs=0 cs=0 l=0 c=0xed52dc0).accept connect_s
eq 0 vs existing 0 state connecting
2015-02-13 19:10:09.991344 7f439c4a6700  0 -- 10.1.100.14:6836/2071712 >> 10.1.100.6:6913/14182697 pipe(0xe9d3600 sd=756 :6836 s=0 pgs=0 cs=0 l=0 c=0xed526e0).accept connect_s
eq 0 vs existing 0 state connecting
2015-02-13 19:10:10.099747 7f43a4256700  0 -- 10.1.100.14:6836/2071712 >> 10.1.100.13:6843/15181065 pipe(0xd975340 sd=702 :6836 s=0 pgs=0 cs=0 l=0 c=0xe913860).accept connect_
seq 0 vs existing 0 state wait
2015-02-13 19:10:10.246934 7f43919fc700  0 -- 10.1.100.14:6836/2071712 >> 10.1.100.1:6823/13119018 pipe(0xe9d3340 sd=840 :6836 s=0 pgs=0 cs=0 l=0 c=0xed52580).accept connect_s
eq 0 vs existing 0 state connecting
2015-02-13 19:10:10.305592 7f4390aed700  0 -- 10.1.100.14:6836/2071712 >> 10.1.100.1:6922/10112411 pipe(0xe9d3080 sd=848 :6836 s=0 pgs=0 cs=0 l=0 c=0xed52420).accept connect_s
eq 0 vs existing 0 state wait
2015-02-13 19:10:10.447464 7f438d0b3700  0 -- 10.1.100.14:6836/2071712 >> 10.1.100.1:6839/13117552 pipe(0xe9d2dc0 sd=876 :6836 s=0 pgs=0 cs=0 l=0 c=0xed522c0).accept connect_s
eq 0 vs existing 0 state connecting
2015-02-13 19:10:10.528647 7f438c1a4700  0 -- 10.1.100.14:6836/2071712 >> 10.1.100.1:6841/10118584 pipe(0xe9d2b00 sd=884 :6836 s=0 pgs=0 cs=0 l=0 c=0xed52160).accept connect_s
eq 0 vs existing 0 state connecting
2015-02-13 19:10:10.647182 7f4365e43700  0 -- 10.1.100.14:6836/2071712 >> 10.1.100.13:6936/10179964 pipe(0xe9d2840 sd=1229 :6836 s=0 pgs=0 cs=0 l=0 c=0xed52000).accept connect
_seq 0 vs existing 0 state wait
2015-02-13 19:10:10.763373 7f43619ff700  0 -- 10.1.100.14:6836/2071712 >> 10.1.100.13:6806/14167598 pipe(0xe9d2580 sd=1243 :6836 s=0 pgs=0 cs=0 l=0 c=0xa5f4940).accept connect
_seq 0 vs existing 0 state wait
2015-02-13 19:10:35.004540 7f2e9759b900  0 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578), process ceph-osd, pid 2074180
2015-02-13 19:10:35.008746 7f2e9759b900  0 filestore(/var/lib/ceph/osd/ceph-403) backend xfs (magic 0x58465342)
2015-02-13 19:10:35.008768 7f2e9759b900  1 filestore(/var/lib/ceph/osd/ceph-403)  disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs
2015-02-13 19:10:35.035532 7f2e9759b900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-403) detect_features: FIEMAP ioctl is supported and appears to work
2015-02-13 19:10:35.035622 7f2e9759b900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-403) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-02-13 19:10:35.068698 7f2e9759b900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-403) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2015-02-13 19:10:35.068826 7f2e9759b900  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-403) detect_feature: extsize is disabled by conf
2015-02-13 19:10:35.204041 7f2e9759b900  0 filestore(/var/lib/ceph/osd/ceph-403) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2015-02-13 19:10:35.211697 7f2e9759b900  1 journal _open /var/lib/ceph/osd/ceph-403/journal fd 21: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-13 19:10:35.257182 7f2e9759b900  1 journal _open /var/lib/ceph/osd/ceph-403/journal fd 21: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-13 19:10:35.419868 7f2e9759b900  1 journal close /var/lib/ceph/osd/ceph-403/journal
2015-02-13 19:10:35.447009 7f2e9759b900  0 filestore(/var/lib/ceph/osd/ceph-403) backend xfs (magic 0x58465342)
2015-02-13 19:10:35.502898 7f2e9759b900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-403) detect_features: FIEMAP ioctl is supported and appears to work
2015-02-13 19:10:35.502929 7f2e9759b900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-403) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-02-13 19:10:35.552837 7f2e9759b900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-403) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2015-02-13 19:10:35.552945 7f2e9759b900  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-403) detect_feature: extsize is disabled by conf
2015-02-13 19:10:35.663059 7f2e9759b900  0 filestore(/var/lib/ceph/osd/ceph-403) mount: WRITEAHEAD journal mode explicitly enabled in conf
2015-02-13 19:10:35.669623 7f2e9759b900  1 journal _open /var/lib/ceph/osd/ceph-403/journal fd 21: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-13 19:10:35.714111 7f2e9759b900  1 journal _open /var/lib/ceph/osd/ceph-403/journal fd 21: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-13 19:10:35.715330 7f2e9759b900  0 <cls> cls/hello/cls_hello.cc:271: loading cls_hello
2015-02-13 19:10:35.722675 7f2e9759b900  0 osd.403 95527 crush map has features 104186773504, adjusting msgr requires for clients
2015-02-13 19:10:35.722703 7f2e9759b900  0 osd.403 95527 crush map has features 379064680448 was 8705, adjusting msgr requires for mons
2015-02-13 19:10:35.722708 7f2e9759b900  0 osd.403 95527 crush map has features 379064680448, adjusting msgr requires for osds
2015-02-13 19:10:35.722728 7f2e9759b900  0 osd.403 95527 load_pgs
2015-02-13 19:10:36.230034 7f2e9759b900  0 osd.403 95527 load_pgs opened 187 pgs
2015-02-13 19:10:36.231327 7f2e9759b900 -1 osd.403 95527 set_disk_tp_priority(22) Invalid argument: osd_disk_thread_ioprio_class is  but only the following values are allowed:
 idle, be or rt
2015-02-13 19:10:36.239635 7f2e83cef700  0 osd.403 95527 ignoring osdmap until we have initialized
2015-02-13 19:10:36.247880 7f2e83cef700  0 osd.403 95527 ignoring osdmap until we have initialized
2015-02-13 19:10:36.322880 7f2e9759b900  0 osd.403 95527 done with init, starting boot process
2015-02-13 19:10:38.395813 7f2e503d7700  0 -- 10.1.100.14:6838/2074180 >> 10.1.100.11:6858/4560 pipe(0xb58db80 sd=397 :6838 s=0 pgs=0 cs=0 l=0 c=0xb4652e0).accept connect_seq
0 vs existing 0 state connecting
2015-02-13 19:10:38.448288 7f2e43f13700  0 -- 10.1.100.14:6838/2074180 >> 10.1.100.15:6840/7116025 pipe(0xb045600 sd=506 :6838 s=0 pgs=0 cs=0 l=0 c=0xc59c580).accept connect_s
eq 0 vs existing 0 state connecting
2015-02-13 19:10:38.505886 7f2e3b98e700  0 -- 10.1.100.14:6838/2074180 >> 10.1.100.2:6831/14199331 pipe(0xbe4a940 sd=585 :6838 s=0 pgs=0 cs=0 l=0 c=0xafb4580).accept connect_s
--More--

 Regards    
K.Mohamed Pakkeer

On Thu, Feb 12, 2015 at 8:31 PM, Mohamed Pakkeer <mdfakkeer@xxxxxxxxx> wrote:
Hi all,

Cluster : 540 OSDs , Cache tier and EC pool
ceph version 0.87


cluster c2a97a2f-fdc7-4eb5-82ef-70c52f2eceb1
     health HEALTH_WARN 10 pgs peering; 21 pgs stale; 2 pgs stuck inactive; 2 pgs stuck unclean; 287 requests are blocked > 32 sec; recovery 24/6707031 objects degraded (0.000%); too few pgs per osd (13 < min 20); 1/552 in osds are down; clock skew detected on mon.master02, mon.master03
     monmap e3: 3 mons at {master01=10.1.2.231:6789/0,master02=10.1.2.232:6789/0,master03=10.1.2.233:6789/0}, election epoch 4, quorum 0,1,2 master01,master02,master03
     mdsmap e17: 1/1/1 up {0=master01=up:active}
     osdmap e57805: 552 osds: 551 up, 552 in
      pgmap v278604: 7264 pgs, 3 pools, 2027 GB data, 547 kobjects
            3811 GB used, 1958 TB / 1962 TB avail
            24/6707031 objects degraded (0.000%)
                   7 stale+peering
                   3 peering
                7240 active+clean
                  13 stale
                   1 stale+active




We have mounted ceph using ceph-fuse client . Suddenly some of osds are re spawning continuously. Still cluster health is unstable. How to stop the respawning osds? 



2015-02-12 18:41:51.562337 7f8371373900  0 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578), process ceph-osd, pid 3911
2015-02-12 18:41:51.564781 7f8371373900  0 filestore(/var/lib/ceph/osd/ceph-538) backend xfs (magic 0x58465342)
2015-02-12 18:41:51.564792 7f8371373900  1 filestore(/var/lib/ceph/osd/ceph-538)  disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs
2015-02-12 18:41:51.655623 7f8371373900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is supported and appears to work
2015-02-12 18:41:51.655639 7f8371373900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-02-12 18:41:51.663864 7f8371373900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2015-02-12 18:41:51.663910 7f8371373900  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_feature: extsize is disabled by conf
2015-02-12 18:41:51.994021 7f8371373900  0 filestore(/var/lib/ceph/osd/ceph-538) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2015-02-12 18:41:52.788178 7f8371373900  1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-12 18:41:52.848430 7f8371373900  1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-12 18:41:52.922806 7f8371373900  1 journal close /var/lib/ceph/osd/ceph-538/journal
2015-02-12 18:41:52.948320 7f8371373900  0 filestore(/var/lib/ceph/osd/ceph-538) backend xfs (magic 0x58465342)
2015-02-12 18:41:52.981122 7f8371373900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is supported and appears to work
2015-02-12 18:41:52.981137 7f8371373900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-02-12 18:41:52.989395 7f8371373900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2015-02-12 18:41:52.989440 7f8371373900  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_feature: extsize is disabled by conf
2015-02-12 18:41:53.149095 7f8371373900  0 filestore(/var/lib/ceph/osd/ceph-538) mount: WRITEAHEAD journal mode explicitly enabled in conf
2015-02-12 18:41:53.154258 7f8371373900  1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-12 18:41:53.217404 7f8371373900  1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-12 18:41:53.467512 7f8371373900  0 <cls> cls/hello/cls_hello.cc:271: loading cls_hello
2015-02-12 18:41:53.563846 7f8371373900  0 osd.538 54486 crush map has features 104186773504, adjusting msgr requires for clients
2015-02-12 18:41:53.563865 7f8371373900  0 osd.538 54486 crush map has features 379064680448 was 8705, adjusting msgr requires for mons
2015-02-12 18:41:53.563869 7f8371373900  0 osd.538 54486 crush map has features 379064680448, adjusting msgr requires for osds
2015-02-12 18:41:53.563888 7f8371373900  0 osd.538 54486 load_pgs
2015-02-12 18:41:55.430730 7f8371373900  0 osd.538 54486 load_pgs opened 137 pgs
2015-02-12 18:41:55.432854 7f8371373900 -1 osd.538 54486 set_disk_tp_priority(22) Invalid argument: osd_disk_thread_ioprio_class is  but only the following values are allowed:
 idle, be or rt
2015-02-12 18:41:55.442748 7f835dfc8700  0 osd.538 54486 ignoring osdmap until we have initialized
2015-02-12 18:41:55.456802 7f835dfc8700  0 osd.538 54486 ignoring osdmap until we have initialized
2015-02-12 18:41:55.590831 7f8371373900  0 osd.538 54486 done with init, starting boot process
2015-02-12 18:42:08.601833 7f830cead700  0 -- 10.1.100.14:6836/3911 >> 10.1.100.4:6843/4178616 pipe(0x12528680 sd=495 :0 s=1 pgs=0 cs=0 l=0 c=0x10246680).fault with nothing to
 send, going to standby
2015-02-12 18:42:10.460257 7f830be70700  0 -- 10.1.100.14:6836/3911 >> 10.1.100.14:6806/3483 pipe(0x12528680 sd=536 :0 s=1 pgs=0 cs=0 l=0 c=0x10b612e0).fault with nothing to s
end, going to standby
2015-02-12 18:42:20.012175 7f830be70700  0 -- 10.1.100.14:6836/3911 >> 10.1.100.14:6806/3483 pipe(0x12528680 sd=536 :0 s=1 pgs=0 cs=1 l=0 c=0x10b612e0).fault
2015-02-12 18:42:20.038834 7f82f1a9e700  0 -- 10.1.2.14:0/3911 >> 10.1.2.14:6810/3483 pipe(0x12324ec0 sd=844 :0 s=1 pgs=0 cs=0 l=1 c=0x1231dc80).fault
2015-02-12 18:42:20.045447 7f82f1b9f700  0 -- 10.1.2.14:0/3911 >> 10.1.100.14:6807/3483 pipe(0x12325180 sd=846 :0 s=1 pgs=0 cs=0 l=1 c=0x1231dde0).fault
2015-02-12 18:42:49.094270 7f836797c700 -1 osd.538 54728 heartbeat_check: no reply from osd.176 since back 2015-02-12 18:42:28.444361 front 2015-02-12 18:42:28.444361 (cutoff
2015-02-12 18:42:29.094265)
2015-02-12 18:42:49.622922 7f834cfa6700 -1 osd.538 54728 heartbeat_check: no reply from osd.176 since back 2015-02-12 18:42:33.345980 front 2015-02-12 18:42:28.444361 (cutoff
2015-02-12 18:42:29.622919)
2015-02-12 18:42:51.094801 7f836797c700  0 log_channel(default) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 30.960507 secs
2015-02-12 18:42:51.094825 7f836797c700  0 log_channel(default) log [WRN] : slow request 30.960507 seconds old, received at 2015-02-12 18:42:20.134236: osd_op(osd.542.54048:1
100000002ae.00000000 [copy-from ver 7622] 13.7273b256 RETRY=513 snapc 1=[] ondisk+retry+write+ignore_overlay+enforce_snapc+known_if_redirected e54708) currently reached_pg



2015-02-12 18:42:53.354106 7f9655242900  0 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578), process ceph-osd, pid 84689
2015-02-12 18:42:53.359088 7f9655242900  0 filestore(/var/lib/ceph/osd/ceph-538) backend xfs (magic 0x58465342)
2015-02-12 18:42:53.359116 7f9655242900  1 filestore(/var/lib/ceph/osd/ceph-538)  disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs
2015-02-12 18:42:53.395684 7f9655242900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is supported and appears to work
2015-02-12 18:42:53.395711 7f9655242900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-02-12 18:42:53.445563 7f9655242900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2015-02-12 18:42:53.445652 7f9655242900  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_feature: extsize is disabled by conf
2015-02-12 18:42:53.579957 7f9655242900  0 filestore(/var/lib/ceph/osd/ceph-538) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2015-02-12 18:42:53.584720 7f9655242900  1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-12 18:42:53.626940 7f9655242900  1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-12 18:42:53.704585 7f9655242900  1 journal close /var/lib/ceph/osd/ceph-538/journal
2015-02-12 18:42:53.734618 7f9655242900  0 filestore(/var/lib/ceph/osd/ceph-538) backend xfs (magic 0x58465342)
2015-02-12 18:42:53.771148 7f9655242900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is supported and appears to work
2015-02-12 18:42:53.771179 7f9655242900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-02-12 18:42:53.779389 7f9655242900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2015-02-12 18:42:53.779449 7f9655242900  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_feature: extsize is disabled by conf
2015-02-12 18:42:53.913933 7f9655242900  0 filestore(/var/lib/ceph/osd/ceph-538) mount: WRITEAHEAD journal mode explicitly enabled in conf
2015-02-12 18:42:53.918308 7f9655242900  1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 21: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-12 18:42:53.951526 7f9655242900  1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 21: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-12 18:42:53.952920 7f9655242900  0 <cls> cls/hello/cls_hello.cc:271: loading cls_hello
2015-02-12 18:42:53.959341 7f9655242900  0 osd.538 54728 crush map has features 104186773504, adjusting msgr requires for clients
2015-02-12 18:42:53.959356 7f9655242900  0 osd.538 54728 crush map has features 379064680448 was 8705, adjusting msgr requires for mons
2015-02-12 18:42:53.959360 7f9655242900  0 osd.538 54728 crush map has features 379064680448, adjusting msgr requires for osds
2015-02-12 18:42:53.959378 7f9655242900  0 osd.538 54728 load_pgs
2015-02-12 18:42:54.306386 7f9655242900  0 osd.538 54728 load_pgs opened 137 pgs
2015-02-12 18:42:54.307429 7f9655242900 -1 osd.538 54728 set_disk_tp_priority(22) Invalid argument: osd_disk_thread_ioprio_class is  but only the following values are allowed:
 idle, be or rt
2015-02-12 18:42:54.314711 7f9641cb7700  0 osd.538 54728 ignoring osdmap until we have initialized
2015-02-12 18:42:54.314749 7f9641cb7700  0 osd.538 54728 ignoring osdmap until we have initialized
2015-02-12 18:42:54.371560 7f9655242900  0 osd.538 54728 done with init, starting boot process
2015-02-12 18:42:56.079385 7f95fde9b700  0 -- 10.1.100.14:6874/84689 >> 10.1.100.4:6861/15126717 pipe(0xacf5340 sd=504 :6874 s=0 pgs=0 cs=0 l=0 c=0x9d72c60).accept connect_seq
 0 vs existing 0 state connecting
2015-02-12 18:42:56.160775 7f95eecaa700  0 -- 10.1.100.14:6874/84689 >> 10.1.100.5:6942/14126479 pipe(0xa814840 sd=624 :6874 s=0 pgs=0 cs=0 l=0 c=0xb321a20).accept connect_seq
 0 vs existing 0 state wait
2015-02-12 18:42:56.170650 7f96000bd700  0 -- 10.1.100.14:6874/84689 >> 10.1.100.13:6808/14152675 pipe(0xaa41340 sd=486 :6874 s=0 pgs=0 cs=0 l=0 c=0xa8d9b20).accept connect_se
q 0 vs existing 0 state connecting
2015-02-12 18:42:56.215545 7f95e7533700  0 -- 10.1.100.14:6874/84689 >> 10.1.100.13:6903/11158823 pipe(0xb1da100 sd=683 :6874 s=0 pgs=0 cs=0 l=0 c=0xaea0260).accept connect_se
q 0 vs existing 0 state connecting
2015-02-12 18:42:56.222787 7f95e712f700  0 -- 10.1.100.14:6874/84689 >> 10.1.100.11:6831/10414111 pipe(0xb1d98c0 sd=686 :6874 s=0 pgs=0 cs=0 l=0 c=0xae9fe40).accept connect_se
q 0 vs existing 0 state wait
2015-02-12 18:42:56.471608 7f95d6a29700  0 -- 10.1.100.14:6874/84689 >> 10.1.100.6:6872/17198411 pipe(0xb593600 sd=813 :6874 s=0 pgs=0 cs=0 l=0 c=0xaf41b80).accept connect_seq
 0 vs existing 0 state wait
2015-02-12 18:42:56.551898 7f95d4403700  0 -- 10.1.100.14:6874/84689 >> 10.1.100.1:6835/17116573 pipe(0xb593080 sd=832 :6874 s=0 pgs=0 cs=0 l=0 c=0xaf418c0).accept connect_seq
 0 vs existing 0 state wait


2015-02-12 18:42:59.123753 7f7175bf7900  0 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578), process ceph-osd, pid 86860
2015-02-12 18:42:59.128606 7f7175bf7900  0 filestore(/var/lib/ceph/osd/ceph-538) backend xfs (magic 0x58465342)
2015-02-12 18:42:59.128620 7f7175bf7900  1 filestore(/var/lib/ceph/osd/ceph-538)  disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs
2015-02-12 18:42:59.202824 7f7175bf7900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is supported and appears to work
2015-02-12 18:42:59.202851 7f7175bf7900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-02-12 18:42:59.402460 7f7175bf7900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2015-02-12 18:42:59.402541 7f7175bf7900  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_feature: extsize is disabled by conf
2015-02-12 18:42:59.571199 7f7175bf7900  0 filestore(/var/lib/ceph/osd/ceph-538) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2015-02-12 18:42:59.576472 7f7175bf7900  1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-12 18:42:59.983516 7f7175bf7900  1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-12 18:43:00.245124 7f7175bf7900  1 journal close /var/lib/ceph/osd/ceph-538/journal
2015-02-12 18:43:00.348046 7f7175bf7900  0 filestore(/var/lib/ceph/osd/ceph-538) backend xfs (magic 0x58465342)
2015-02-12 18:43:00.396662 7f7175bf7900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is supported and appears to work
2015-02-12 18:43:00.396682 7f7175bf7900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option

[ 6216.144870] init: ceph-osd (ceph/538) main process ended, respawning
[ 6223.548268] init: ceph-osd (ceph/538) main process (1035681) terminated with status 1
[ 6223.548295] init: ceph-osd (ceph/538) main process ended, respawning
[ 6230.306315] init: ceph-osd (ceph/538) main process (1037980) terminated with status 1
[ 6230.306337] init: ceph-osd (ceph/538) main process ended, respawning
[ 6239.132669] init: ceph-osd (ceph/538) main process (1040206) terminated with status 1
[ 6239.132687] init: ceph-osd (ceph/538) main process ended, respawning
[ 6245.699440] init: ceph-osd (ceph/538) main process (1042452) terminated with status 1
[ 6245.699463] init: ceph-osd (ceph/538) main process ended, respawning
[ 6254.057325] init: ceph-osd (ceph/538) main process (1044412) terminated with status 1
[ 6254.057342] init: ceph-osd (ceph/538) main process ended, respawning
[ 6261.686181] init: ceph-osd (ceph/538) main process (1046709) terminated with status 1
[ 6261.686198] init: ceph-osd (ceph/538) main process ended, respawning
[ 6269.204085] init: ceph-osd (ceph/538) main process (1049003) terminated with status 1
[ 6269.204102] init: ceph-osd (ceph/538) main process ended, respawning
[ 6276.458609] init: ceph-osd (ceph/538) main process (1051292) terminated with status 1
[ 6276.458634] init: ceph-osd (ceph/538) main process ended, respawning
[ 6283.972596] init: ceph-osd (ceph/538) main process (1053612) terminated with status 1
[ 6283.972617] init: ceph-osd (ceph/538) main process ended, respawning
[ 6291.281523] init: ceph-osd (ceph/538) main process (1055886) terminated with status 1
[ 6291.281548] init: ceph-osd (ceph/538) main process ended, respawning
[ 6299.595198] init: ceph-osd (ceph/538) main process (1058175) terminated with status 1
[ 6299.595217] init: ceph-osd (ceph/538) main process ended, respawning
[ 6307.142994] init: ceph-osd (ceph/538) main process (1060419) terminated with status 1
[ 6307.143013] init: ceph-osd (ceph/538) main process ended, respawning


-- 
 Regards   
K.Mohamed Pakkeer





--


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux