Random OSDs respawning continuously

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

Cluster : 540 OSDs , Cache tier and EC pool
ceph version 0.87


cluster c2a97a2f-fdc7-4eb5-82ef-70c52f2eceb1
     health HEALTH_WARN 10 pgs peering; 21 pgs stale; 2 pgs stuck inactive; 2 pgs stuck unclean; 287 requests are blocked > 32 sec; recovery 24/6707031 objects degraded (0.000%); too few pgs per osd (13 < min 20); 1/552 in osds are down; clock skew detected on mon.master02, mon.master03
     monmap e3: 3 mons at {master01=10.1.2.231:6789/0,master02=10.1.2.232:6789/0,master03=10.1.2.233:6789/0}, election epoch 4, quorum 0,1,2 master01,master02,master03
     mdsmap e17: 1/1/1 up {0=master01=up:active}
     osdmap e57805: 552 osds: 551 up, 552 in
      pgmap v278604: 7264 pgs, 3 pools, 2027 GB data, 547 kobjects
            3811 GB used, 1958 TB / 1962 TB avail
            24/6707031 objects degraded (0.000%)
                   7 stale+peering
                   3 peering
                7240 active+clean
                  13 stale
                   1 stale+active




We have mounted ceph using ceph-fuse client . Suddenly some of osds are re spawning continuously. Still cluster health is unstable. How to stop the respawning osds? 



2015-02-12 18:41:51.562337 7f8371373900  0 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578), process ceph-osd, pid 3911
2015-02-12 18:41:51.564781 7f8371373900  0 filestore(/var/lib/ceph/osd/ceph-538) backend xfs (magic 0x58465342)
2015-02-12 18:41:51.564792 7f8371373900  1 filestore(/var/lib/ceph/osd/ceph-538)  disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs
2015-02-12 18:41:51.655623 7f8371373900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is supported and appears to work
2015-02-12 18:41:51.655639 7f8371373900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-02-12 18:41:51.663864 7f8371373900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2015-02-12 18:41:51.663910 7f8371373900  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_feature: extsize is disabled by conf
2015-02-12 18:41:51.994021 7f8371373900  0 filestore(/var/lib/ceph/osd/ceph-538) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2015-02-12 18:41:52.788178 7f8371373900  1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-12 18:41:52.848430 7f8371373900  1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-12 18:41:52.922806 7f8371373900  1 journal close /var/lib/ceph/osd/ceph-538/journal
2015-02-12 18:41:52.948320 7f8371373900  0 filestore(/var/lib/ceph/osd/ceph-538) backend xfs (magic 0x58465342)
2015-02-12 18:41:52.981122 7f8371373900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is supported and appears to work
2015-02-12 18:41:52.981137 7f8371373900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-02-12 18:41:52.989395 7f8371373900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2015-02-12 18:41:52.989440 7f8371373900  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_feature: extsize is disabled by conf
2015-02-12 18:41:53.149095 7f8371373900  0 filestore(/var/lib/ceph/osd/ceph-538) mount: WRITEAHEAD journal mode explicitly enabled in conf
2015-02-12 18:41:53.154258 7f8371373900  1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-12 18:41:53.217404 7f8371373900  1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-12 18:41:53.467512 7f8371373900  0 <cls> cls/hello/cls_hello.cc:271: loading cls_hello
2015-02-12 18:41:53.563846 7f8371373900  0 osd.538 54486 crush map has features 104186773504, adjusting msgr requires for clients
2015-02-12 18:41:53.563865 7f8371373900  0 osd.538 54486 crush map has features 379064680448 was 8705, adjusting msgr requires for mons
2015-02-12 18:41:53.563869 7f8371373900  0 osd.538 54486 crush map has features 379064680448, adjusting msgr requires for osds
2015-02-12 18:41:53.563888 7f8371373900  0 osd.538 54486 load_pgs
2015-02-12 18:41:55.430730 7f8371373900  0 osd.538 54486 load_pgs opened 137 pgs
2015-02-12 18:41:55.432854 7f8371373900 -1 osd.538 54486 set_disk_tp_priority(22) Invalid argument: osd_disk_thread_ioprio_class is  but only the following values are allowed:
 idle, be or rt
2015-02-12 18:41:55.442748 7f835dfc8700  0 osd.538 54486 ignoring osdmap until we have initialized
2015-02-12 18:41:55.456802 7f835dfc8700  0 osd.538 54486 ignoring osdmap until we have initialized
2015-02-12 18:41:55.590831 7f8371373900  0 osd.538 54486 done with init, starting boot process
2015-02-12 18:42:08.601833 7f830cead700  0 -- 10.1.100.14:6836/3911 >> 10.1.100.4:6843/4178616 pipe(0x12528680 sd=495 :0 s=1 pgs=0 cs=0 l=0 c=0x10246680).fault with nothing to
 send, going to standby
2015-02-12 18:42:10.460257 7f830be70700  0 -- 10.1.100.14:6836/3911 >> 10.1.100.14:6806/3483 pipe(0x12528680 sd=536 :0 s=1 pgs=0 cs=0 l=0 c=0x10b612e0).fault with nothing to s
end, going to standby
2015-02-12 18:42:20.012175 7f830be70700  0 -- 10.1.100.14:6836/3911 >> 10.1.100.14:6806/3483 pipe(0x12528680 sd=536 :0 s=1 pgs=0 cs=1 l=0 c=0x10b612e0).fault
2015-02-12 18:42:20.038834 7f82f1a9e700  0 -- 10.1.2.14:0/3911 >> 10.1.2.14:6810/3483 pipe(0x12324ec0 sd=844 :0 s=1 pgs=0 cs=0 l=1 c=0x1231dc80).fault
2015-02-12 18:42:20.045447 7f82f1b9f700  0 -- 10.1.2.14:0/3911 >> 10.1.100.14:6807/3483 pipe(0x12325180 sd=846 :0 s=1 pgs=0 cs=0 l=1 c=0x1231dde0).fault
2015-02-12 18:42:49.094270 7f836797c700 -1 osd.538 54728 heartbeat_check: no reply from osd.176 since back 2015-02-12 18:42:28.444361 front 2015-02-12 18:42:28.444361 (cutoff
2015-02-12 18:42:29.094265)
2015-02-12 18:42:49.622922 7f834cfa6700 -1 osd.538 54728 heartbeat_check: no reply from osd.176 since back 2015-02-12 18:42:33.345980 front 2015-02-12 18:42:28.444361 (cutoff
2015-02-12 18:42:29.622919)
2015-02-12 18:42:51.094801 7f836797c700  0 log_channel(default) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 30.960507 secs
2015-02-12 18:42:51.094825 7f836797c700  0 log_channel(default) log [WRN] : slow request 30.960507 seconds old, received at 2015-02-12 18:42:20.134236: osd_op(osd.542.54048:1
100000002ae.00000000 [copy-from ver 7622] 13.7273b256 RETRY=513 snapc 1=[] ondisk+retry+write+ignore_overlay+enforce_snapc+known_if_redirected e54708) currently reached_pg



2015-02-12 18:42:53.354106 7f9655242900  0 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578), process ceph-osd, pid 84689
2015-02-12 18:42:53.359088 7f9655242900  0 filestore(/var/lib/ceph/osd/ceph-538) backend xfs (magic 0x58465342)
2015-02-12 18:42:53.359116 7f9655242900  1 filestore(/var/lib/ceph/osd/ceph-538)  disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs
2015-02-12 18:42:53.395684 7f9655242900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is supported and appears to work
2015-02-12 18:42:53.395711 7f9655242900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-02-12 18:42:53.445563 7f9655242900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2015-02-12 18:42:53.445652 7f9655242900  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_feature: extsize is disabled by conf
2015-02-12 18:42:53.579957 7f9655242900  0 filestore(/var/lib/ceph/osd/ceph-538) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2015-02-12 18:42:53.584720 7f9655242900  1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-12 18:42:53.626940 7f9655242900  1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-12 18:42:53.704585 7f9655242900  1 journal close /var/lib/ceph/osd/ceph-538/journal
2015-02-12 18:42:53.734618 7f9655242900  0 filestore(/var/lib/ceph/osd/ceph-538) backend xfs (magic 0x58465342)
2015-02-12 18:42:53.771148 7f9655242900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is supported and appears to work
2015-02-12 18:42:53.771179 7f9655242900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-02-12 18:42:53.779389 7f9655242900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2015-02-12 18:42:53.779449 7f9655242900  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_feature: extsize is disabled by conf
2015-02-12 18:42:53.913933 7f9655242900  0 filestore(/var/lib/ceph/osd/ceph-538) mount: WRITEAHEAD journal mode explicitly enabled in conf
2015-02-12 18:42:53.918308 7f9655242900  1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 21: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-12 18:42:53.951526 7f9655242900  1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 21: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-12 18:42:53.952920 7f9655242900  0 <cls> cls/hello/cls_hello.cc:271: loading cls_hello
2015-02-12 18:42:53.959341 7f9655242900  0 osd.538 54728 crush map has features 104186773504, adjusting msgr requires for clients
2015-02-12 18:42:53.959356 7f9655242900  0 osd.538 54728 crush map has features 379064680448 was 8705, adjusting msgr requires for mons
2015-02-12 18:42:53.959360 7f9655242900  0 osd.538 54728 crush map has features 379064680448, adjusting msgr requires for osds
2015-02-12 18:42:53.959378 7f9655242900  0 osd.538 54728 load_pgs
2015-02-12 18:42:54.306386 7f9655242900  0 osd.538 54728 load_pgs opened 137 pgs
2015-02-12 18:42:54.307429 7f9655242900 -1 osd.538 54728 set_disk_tp_priority(22) Invalid argument: osd_disk_thread_ioprio_class is  but only the following values are allowed:
 idle, be or rt
2015-02-12 18:42:54.314711 7f9641cb7700  0 osd.538 54728 ignoring osdmap until we have initialized
2015-02-12 18:42:54.314749 7f9641cb7700  0 osd.538 54728 ignoring osdmap until we have initialized
2015-02-12 18:42:54.371560 7f9655242900  0 osd.538 54728 done with init, starting boot process
2015-02-12 18:42:56.079385 7f95fde9b700  0 -- 10.1.100.14:6874/84689 >> 10.1.100.4:6861/15126717 pipe(0xacf5340 sd=504 :6874 s=0 pgs=0 cs=0 l=0 c=0x9d72c60).accept connect_seq
 0 vs existing 0 state connecting
2015-02-12 18:42:56.160775 7f95eecaa700  0 -- 10.1.100.14:6874/84689 >> 10.1.100.5:6942/14126479 pipe(0xa814840 sd=624 :6874 s=0 pgs=0 cs=0 l=0 c=0xb321a20).accept connect_seq
 0 vs existing 0 state wait
2015-02-12 18:42:56.170650 7f96000bd700  0 -- 10.1.100.14:6874/84689 >> 10.1.100.13:6808/14152675 pipe(0xaa41340 sd=486 :6874 s=0 pgs=0 cs=0 l=0 c=0xa8d9b20).accept connect_se
q 0 vs existing 0 state connecting
2015-02-12 18:42:56.215545 7f95e7533700  0 -- 10.1.100.14:6874/84689 >> 10.1.100.13:6903/11158823 pipe(0xb1da100 sd=683 :6874 s=0 pgs=0 cs=0 l=0 c=0xaea0260).accept connect_se
q 0 vs existing 0 state connecting
2015-02-12 18:42:56.222787 7f95e712f700  0 -- 10.1.100.14:6874/84689 >> 10.1.100.11:6831/10414111 pipe(0xb1d98c0 sd=686 :6874 s=0 pgs=0 cs=0 l=0 c=0xae9fe40).accept connect_se
q 0 vs existing 0 state wait
2015-02-12 18:42:56.471608 7f95d6a29700  0 -- 10.1.100.14:6874/84689 >> 10.1.100.6:6872/17198411 pipe(0xb593600 sd=813 :6874 s=0 pgs=0 cs=0 l=0 c=0xaf41b80).accept connect_seq
 0 vs existing 0 state wait
2015-02-12 18:42:56.551898 7f95d4403700  0 -- 10.1.100.14:6874/84689 >> 10.1.100.1:6835/17116573 pipe(0xb593080 sd=832 :6874 s=0 pgs=0 cs=0 l=0 c=0xaf418c0).accept connect_seq
 0 vs existing 0 state wait


2015-02-12 18:42:59.123753 7f7175bf7900  0 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578), process ceph-osd, pid 86860
2015-02-12 18:42:59.128606 7f7175bf7900  0 filestore(/var/lib/ceph/osd/ceph-538) backend xfs (magic 0x58465342)
2015-02-12 18:42:59.128620 7f7175bf7900  1 filestore(/var/lib/ceph/osd/ceph-538)  disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs
2015-02-12 18:42:59.202824 7f7175bf7900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is supported and appears to work
2015-02-12 18:42:59.202851 7f7175bf7900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-02-12 18:42:59.402460 7f7175bf7900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2015-02-12 18:42:59.402541 7f7175bf7900  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_feature: extsize is disabled by conf
2015-02-12 18:42:59.571199 7f7175bf7900  0 filestore(/var/lib/ceph/osd/ceph-538) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2015-02-12 18:42:59.576472 7f7175bf7900  1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-12 18:42:59.983516 7f7175bf7900  1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-02-12 18:43:00.245124 7f7175bf7900  1 journal close /var/lib/ceph/osd/ceph-538/journal
2015-02-12 18:43:00.348046 7f7175bf7900  0 filestore(/var/lib/ceph/osd/ceph-538) backend xfs (magic 0x58465342)
2015-02-12 18:43:00.396662 7f7175bf7900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is supported and appears to work
2015-02-12 18:43:00.396682 7f7175bf7900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option

[ 6216.144870] init: ceph-osd (ceph/538) main process ended, respawning
[ 6223.548268] init: ceph-osd (ceph/538) main process (1035681) terminated with status 1
[ 6223.548295] init: ceph-osd (ceph/538) main process ended, respawning
[ 6230.306315] init: ceph-osd (ceph/538) main process (1037980) terminated with status 1
[ 6230.306337] init: ceph-osd (ceph/538) main process ended, respawning
[ 6239.132669] init: ceph-osd (ceph/538) main process (1040206) terminated with status 1
[ 6239.132687] init: ceph-osd (ceph/538) main process ended, respawning
[ 6245.699440] init: ceph-osd (ceph/538) main process (1042452) terminated with status 1
[ 6245.699463] init: ceph-osd (ceph/538) main process ended, respawning
[ 6254.057325] init: ceph-osd (ceph/538) main process (1044412) terminated with status 1
[ 6254.057342] init: ceph-osd (ceph/538) main process ended, respawning
[ 6261.686181] init: ceph-osd (ceph/538) main process (1046709) terminated with status 1
[ 6261.686198] init: ceph-osd (ceph/538) main process ended, respawning
[ 6269.204085] init: ceph-osd (ceph/538) main process (1049003) terminated with status 1
[ 6269.204102] init: ceph-osd (ceph/538) main process ended, respawning
[ 6276.458609] init: ceph-osd (ceph/538) main process (1051292) terminated with status 1
[ 6276.458634] init: ceph-osd (ceph/538) main process ended, respawning
[ 6283.972596] init: ceph-osd (ceph/538) main process (1053612) terminated with status 1
[ 6283.972617] init: ceph-osd (ceph/538) main process ended, respawning
[ 6291.281523] init: ceph-osd (ceph/538) main process (1055886) terminated with status 1
[ 6291.281548] init: ceph-osd (ceph/538) main process ended, respawning
[ 6299.595198] init: ceph-osd (ceph/538) main process (1058175) terminated with status 1
[ 6299.595217] init: ceph-osd (ceph/538) main process ended, respawning
[ 6307.142994] init: ceph-osd (ceph/538) main process (1060419) terminated with status 1
[ 6307.143013] init: ceph-osd (ceph/538) main process ended, respawning


-- 
 Regards   
K.Mohamed Pakkeer


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux