Good day!
Tried to nullify thid osd and reinject it with no success. It works a little bit then the crash again.
Regards, Artem Silenkov, 2GIS TM.
---
2GIS LLC
cell:+79231534853
2013/6/5 Artem Silenkov <artem.silenkov@xxxxxxxxx>
Hello!We have simple setup as follows:Debian GNU/Linux 6.0 x64Linux h08 2.6.32-19-pve #1 SMP Wed May 15 07:32:52 CEST 2013 x86_64 GNU/Linuxii ceph 0.61.2-1~bpo60+1 distributed storage and file systemii ceph-common 0.61.2-1~bpo60+1 common utilities to mount and interact with a ceph storage clusterii ceph-fs-common 0.61.2-1~bpo60+1 common utilities to mount and interact with a ceph file systemii ceph-fuse 0.61.2-1~bpo60+1 FUSE-based client for the Ceph distributed file systemii ceph-mds 0.61.2-1~bpo60+1 metadata server for the ceph distributed file systemii libcephfs1 0.61.2-1~bpo60+1 Ceph distributed file system client libraryii libc-bin 2.11.3-4 Embedded GNU C Library: Binariesii libc-dev-bin 2.11.3-4 Embedded GNU C Library: Development binariesii libc6 2.11.3-4 Embedded GNU C Library: Shared librariesii libc6-dev 2.11.3-4 Embedded GNU C Library: Development Libraries and Header FilesAll programs are running fine except osd.2 which is crashing repeatedly.All other nodes have the same operating system onboard and all the system environment is quite identical.#cat /etc/ceph/ceph.conf[global]pid file = /var/run/ceph/$name.pidauth cluster required = noneauth service required = noneauth client required = nonemax open files = 65000[mon][mon.0]host = h01mon addr = 10.1.1.3:6789[mon.1]host = h07mon addr = 10.1.1.10:6789[mon.2]host = h08mon addr = 10.1.1.11:6789[mds][mds.3]host = h09[mds.4]host = h06[osd]osd journal size = 10000osd journal = /var/lib/ceph/journal/$cluster-$id/journalosd mkfs type = xfs[osd.0]host = h01addr = 10.1.1.3devs = /dev/sda3[osd.1]host = h07addr = 10.1.1.10devs = /dev/sda3[osd.2]host = h08addr = 10.1.1.11devs = /dev/sda3[osd.3]host = h09addr = 10.1.1.12devs = /dev/sda3[osd.4]host = h06addr = 10.1.1.9devs = /dev/sda3~#ceph osd tree# id weight type name up/down reweight-1 5 root default-3 5 rack unknownrack-2 1 host h010 1 osd.0 up 1-4 1 host h071 1 osd.1 up 1-5 1 host h082 1 osd.2 down 0-6 1 host h093 1 osd.3 up 1-7 1 host h064 1 osd.4 up 1When crashing ceph-osd process could fall into zombie state with no possibility even umount osd partition.My gdb show the following#gdb /usr/bin/ceph-osd /coreGNU gdb (GDB) 7.0.1-debianCopyright (C) 2009 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law. Type "show copying"and "show warranty" for details.This GDB was configured as "x86_64-linux-gnu".For bug reporting instructions, please see:Reading symbols from /usr/bin/ceph-osd...(no debugging symbols found)...done.[New Thread 809630][New Thread 809628][New Thread 809631][New Thread 809632][New Thread 809633][New Thread 809634][New Thread 809672][New Thread 809629][New Thread 809524][New Thread 809421][New Thread 137559][New Thread 809636][New Thread 809635][New Thread 809677][New Thread 809679][New Thread 809527][New Thread 137560][New Thread 809420][New Thread 809637][New Thread 809685][New Thread 809525][New Thread 809638][New Thread 99663][New Thread 809523][New Thread 809639][New Thread 809522][New Thread 809640][New Thread 809644][New Thread 809641][New Thread 809643][New Thread 809648][New Thread 809668][New Thread 809669][New Thread 809671][New Thread 809676][New Thread 809680][New Thread 809681][New Thread 56075][New Thread 809682][New Thread 107924][New Thread 809683][New Thread 108037][New Thread 809684][New Thread 119704][New Thread 809686][New Thread 809537][New Thread 56073][New Thread 85231][New Thread 85232][New Thread 99661][New Thread 809535][New Thread 99662][New Thread 107922][New Thread 119705][New Thread 107928][New Thread 108035][New Thread 809410][New Thread 809528][New Thread 809530][New Thread 809531][New Thread 809533][New Thread 809536][New Thread 809642][New Thread 809534][New Thread 809411][New Thread 809645][New Thread 809667][New Thread 809670][New Thread 809526][New Thread 809521][New Thread 809532][New Thread 809529]warning: Can't read pathname for load map: Input/output error.Reading symbols from /lib/libaio.so.1...(no debugging symbols found)...done.Loaded symbols for /lib/libaio.so.1Reading symbols from /usr/lib/libnss3.so.1d...(no debugging symbols found)...done.Loaded symbols for /usr/lib/libnss3.so.1dReading symbols from /usr/lib/libnspr4.so.0d...(no debugging symbols found)...done.Loaded symbols for /usr/lib/libnspr4.so.0dReading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.Loaded symbols for /lib/libpthread.so.0Reading symbols from /lib/libuuid.so.1...(no debugging symbols found)...done.Loaded symbols for /lib/libuuid.so.1Reading symbols from /lib/librt.so.1...(no debugging symbols found)...done.Loaded symbols for /lib/librt.so.1Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.Loaded symbols for /lib/libdl.so.2Reading symbols from /usr/lib/libtcmalloc.so.0...(no debugging symbols found)...done.Loaded symbols for /usr/lib/libtcmalloc.so.0Reading symbols from /usr/lib/libboost_thread.so.1.42.0...(no debugging symbols found)...done.Loaded symbols for /usr/lib/libboost_thread.so.1.42.0Reading symbols from /usr/lib/libleveldb.so.1...(no debugging symbols found)...done.Loaded symbols for /usr/lib/libleveldb.so.1Reading symbols from /usr/lib/libstdc++.so.6...(no debugging symbols found)...done.Loaded symbols for /usr/lib/libstdc++.so.6Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.Loaded symbols for /lib/libm.so.6Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.Loaded symbols for /lib/libgcc_s.so.1Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.Loaded symbols for /lib/libc.so.6Reading symbols from /usr/lib/libnssutil3.so.1d...(no debugging symbols found)...done.Loaded symbols for /usr/lib/libnssutil3.so.1dReading symbols from /usr/lib/libplc4.so.0d...(no debugging symbols found)...done.Loaded symbols for /usr/lib/libplc4.so.0dReading symbols from /usr/lib/libplds4.so.0d...(no debugging symbols found)...done.Loaded symbols for /usr/lib/libplds4.so.0dReading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.Loaded symbols for /lib64/ld-linux-x86-64.so.2Reading symbols from /usr/lib/libunwind.so.7...(no debugging symbols found)...done.Loaded symbols for /usr/lib/libunwind.so.7Reading symbols from /usr/lib/libsnappy.so.1...(no debugging symbols found)...done.Loaded symbols for /usr/lib/libsnappy.so.1Reading symbols from /usr/lib/nss/libsoftokn3.so...(no debugging symbols found)...done.Loaded symbols for /usr/lib/nss/libsoftokn3.soReading symbols from /usr/lib/libsqlite3.so.0...(no debugging symbols found)...done.Loaded symbols for /usr/lib/libsqlite3.so.0Reading symbols from /usr/lib/nss/libfreebl3.so...(no debugging symbols found)...done.Loaded symbols for /usr/lib/nss/libfreebl3.soReading symbols from /usr/lib/rados-classes/libcls_lock.so...done.Loaded symbols for /usr/lib/rados-classes/libcls_lock.soReading symbols from /usr/lib/libboost_system.so.1.42.0...(no debugging symbols found)...done.Loaded symbols for /usr/lib/libboost_system.so.1.42.0Reading symbols from /usr/lib/rados-classes/libcls_rgw.so...done.Loaded symbols for /usr/lib/rados-classes/libcls_rgw.sowarning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff87fe000Core was generated by `/usr/bin/ceph-osd -i 2 --pid-file /var/run/ceph/osd.2.pid -c /etc/ceph/ceph.con'.Program terminated with signal 6, Aborted.#0 0x00007f7e994b9ebb in raise () from /lib/libpthread.so.0(gdb) bt#0 0x00007f7e994b9ebb in raise () from /lib/libpthread.so.0#1 0x00000000007a16c7 in ?? ()#2 <signal handler called>#3 0x00007f7e97cf21b5 in raise () from /lib/libc.so.6#4 0x00007f7e97cf4fc0 in abort () from /lib/libc.so.6#5 0x00007f7e98586dc5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6#6 0x00007f7e98585166 in ?? () from /usr/lib/libstdc++.so.6#7 0x00007f7e98585193 in std::terminate() () from /usr/lib/libstdc++.so.6#8 0x00007f7e9858528e in __cxa_throw () from /usr/lib/libstdc++.so.6#9 0x00000000007f9f79 in ceph::__ceph_assert_fail(char const*, char const*, int, char const*) ()#10 0x0000000000763ca1 in SyncEntryTimeout::finish(int) ()#11 0x00000000005b828a in Context::complete(int) ()#12 0x00000000008b3793 in SafeTimer::timer_thread() ()#13 0x00000000008b595d in SafeTimerThread::entry() ()#14 0x00007f7e994b18ca in start_thread () from /lib/libpthread.so.0#15 0x00007f7e97d8fb6d in clone () from /lib/libc.so.6#16 0x0000000000000000 in ?? ()(gdb)Problem is common only for this one osd.2 and all other services running fine. I have a lot of core dumped if any need.Please help fix this issue. Our cluster running as follows#ceph -whealth HEALTH_WARN 2 pgs backfilling; 2 pgs degraded; 3 pgs recovering; 39 pgs recovery_wait; 44 pgs stuck unclean; recovery 157580/1744054 degraded (9.035%); recovering 105 o/s, 7442KB/s; 1 mons down, quorum 0,1 0,1monmap e1: 3 mons at {0=10.1.1.3:6789/0,1=10.1.1.10:6789/0,2=10.1.1.11:6789/0}, election epoch 112, quorum 0,1 0,1osdmap e200: 6 osds: 4 up, 4 inpgmap v1133760: 1208 pgs: 1164 active+clean, 39 active+recovery_wait, 2 active+degraded+backfilling, 3 active+recovering; 88915 MB data, 170 GB used, 573 GB / 744 GB avail; 119KB/s rd, 763KB/s wr, 18op/s; 157580/1744054 degraded (9.035%); recovering 105 o/s, 7442KB/smdsmap e16: 1/1/1 up {0=4=up:active}, 1 up:standbyRegards, Artem Silenkov, 2GIS TM.---2GIS LLCcell:+79231534853
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com