Re: ceph-osd constantly crashing

Artem Silenkov <artem.silenkov@xxxxxxxxx> · Thu, 6 Jun 2013 13:35:22 +0700

Good day!

Thank you, but it's not clear for me what is a bottleneck here.
- Hardware node - load average, disk IO

- underlying file system problem on osd or disk bad.
- ceph journal problem
Ceph osd partition is a part of block device which has practically no load

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sda              12,00         0,00         0,12          0          0

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sda              12,00         0,00         0,14          0          0
Disk with osd is good, just checked it and have good r/w speed with appropriate iops and latency. 

But hardware node is working hard and have high load average. I fear that ceph-osd process lack resources. Is there any way to fix it? May be raise some kind of timeout when syncing or make this osd less weight or so? 

Or its better to move this osd to another server?

Regards, Artem Silenkov, 2GIS TM.
---
2GIS LLC
http://2gis.ru
a.silenkov at 2gis.ru
gtalk:artem.silenkov at gmail.com
cell:+79231534853

2013/6/5 Gregory Farnum <greg@xxxxxxxxxxx>

This would be easier to see with a log than with all the GDB stuff, but the reference in the backtrace to "SyncEntryTimeout::finish(int)" tells me that the filesystem is taking too long to sync things to disk. Either this disk is bad or you're somehow subjecting it to a much heavier load than the others.

-Greg

On Wednesday, June 5, 2013, Artem Silenkov  wrote:

Good day! 

Tried to nullify thid osd and reinject it with no success. It works a little bit then the crash again. 

Regards, Artem Silenkov, 2GIS TM.
---
2GIS LLC

http://2gis.ru
a.silenkov@xxxxxxx

gtalk:artem.silenkov@xxxxxxxxx
cell:+79231534853

2013/6/5 Artem Silenkov <artem.silenkov@xxxxxxxxx>

Hello! We have simple setup as follows:

Debian GNU/Linux 6.0 x64
Linux h08 2.6.32-19-pve #1 SMP Wed May 15 07:32:52 CEST 2013 x86_64 GNU/Linux

ii  ceph                             0.61.2-1~bpo60+1             distributed storage and file system

ii  ceph-common                      0.61.2-1~bpo60+1             common utilities to mount and interact with a ceph storage cluster
ii  ceph-fs-common                   0.61.2-1~bpo60+1             common utilities to mount and interact with a ceph file system

ii  ceph-fuse                        0.61.2-1~bpo60+1             FUSE-based client for the Ceph distributed file system
ii  ceph-mds                         0.61.2-1~bpo60+1             metadata server for the ceph distributed file system

ii  libcephfs1                       0.61.2-1~bpo60+1             Ceph distributed file system client library
ii  libc-bin                         2.11.3-4                     Embedded GNU C Library: Binaries

ii  libc-dev-bin                     2.11.3-4                     Embedded GNU C Library: Development binaries
ii  libc6                            2.11.3-4                     Embedded GNU C Library: Shared libraries

ii  libc6-dev                        2.11.3-4                     Embedded GNU C Library: Development Libraries and Header Files

All programs are running fine except osd.2 which is crashing repeatedly.

All other nodes have the same operating system onboard and all the system environment is quite identical. 

#cat /etc/ceph/ceph.conf
[global]
        pid file = /var/run/ceph/$name.pid

        auth cluster required = none
        auth service required = none
        auth client required = none
        max open files = 65000

[mon]
[mon.0]

        host = h01
        mon addr = 10.1.1.3:6789
[mon.1]
        host = h07
        mon addr = 10.1.1.10:6789

[mon.2]
        host = h08
        mon addr = 10.1.1.11:6789

[mds]
[mds.3]
        host = h09

[mds.4]
        host = h06

[osd]
        osd journal size = 10000
        osd journal = /var/lib/ceph/journal/$cluster-$id/journal
        osd mkfs type = xfs

[osd.0]
        host = h01
        addr = 10.1.1.3
        devs = /dev/sda3
[osd.1]
        host = h07
        addr = 10.1.1.10
        devs = /dev/sda3

[osd.2]
        host = h08
        addr = 10.1.1.11
        devs = /dev/sda3
[osd.3]
        host = h09
        addr = 10.1.1.12
        devs = /dev/sda3

[osd.4]
        host = h06
        addr = 10.1.1.9
        devs = /dev/sda3

~#ceph osd tree

# id    weight  type name       up/down reweight

-1      5       root default
-3      5               rack unknownrack
-2      1                       host h01
0       1                               osd.0   up      1
-4      1                       host h07

1       1                               osd.1   up      1
-5      1                       host h08
2       1                               osd.2   down    0
-6      1                       host h09

3       1                               osd.3   up      1
-7      1                       host h06
4       1                               osd.4   up      1

-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com