Re: Ceph-osd Daemon Receives Segmentation Fault on Trusty After Upgrading to 0.94.10 Release

Özhan Rüzgar Karaman <oruzgarkaraman@xxxxxxxxx> · Tue, 21 Mar 2017 10:57:23 +0300

Hi Wido;At weekend i roll back all servers to 0.94.9-1 version and all worked fine with old release.

Today i upgraded all monitor servers and 1 osd server to 0.94.10-1 version. All OSD servers has 2 osds. I update the ceph.conf on the osd server removed debug lines and restart osd daemons. 

This time osd id 3 started and operated successfully but osd id 2 failed again with same segmentation fault. 

I have uploaded new logs as to the same destination as ceph.log.wido.20170321-2.tgz and its link is below again.

https://drive.google.com/drive/folders/0B_hD9LJqrkd7NmtJOW5YUnh6UE0?usp=sharing

Thanks for all your help.

Özhan

On Sun, Mar 19, 2017 at 8:47 PM, Wido den Hollander <wido@xxxxxxxx> wrote:

> Op 17 maart 2017 om 8:39 schreef Özhan Rüzgar Karaman <oruzgarkaraman@xxxxxxxxx>:

>

>

> Hi;

> Yesterday i started to upgrade my Ceph environment from 0.94.9 to 0.94.10.

> All monitor servers upgraded successfully but i experience problems on

> starting upgraded OSD daemons.

>

> When i try to start an Ceph OSD Daemon(/usr/bin/ceph-osd) receives

> Segmentation Fault and it kills after 2-3 minutes. To clarify the issue i

> have role backed Ceph packages on that OSD Server  back to 0.94.9 and

> problematic servers could rejoin to the 0.94.10 cluster.

>

> My environment is standard 14.04.5 Ubuntu Trusty server with 4.4.x kernel

> and i am using standard packages from http://eu.ceph.com/debian-hammer

> nothing special on my environment.

>

> I have uploaded the Ceph OSD Logs to the link below.

>

> https://drive.google.com/drive/folders/0B_hD9LJqrkd7NmtJOW5YUnh6UE0?usp=sharing

>

> And my ceph.conf is below

>

> [global]

> fsid = a3742d34-9b51-4a36-bf56-4defb62b2b8e

> mon_initial_members = mont1, mont2, mont3

> mon_host = 172.16.51.101,172.16.51.102,172.16.51.103

> auth_cluster_required = cephx

> auth_service_required = cephx

> auth_client_required = cephx

> filestore_xattr_use_omap = true

> public_network = 172.16.51.0/24

> cluster_network = 172.16.51.0/24

> debug_ms = 0/0

> debug_auth = 0/0

>

> [mon]

> mon_allow_pool_delete = false

> mon_osd_down_out_interval = 300

> osd_pool_default_flag_nodelete = true

>

> [osd]

> filestore_max_sync_interval = 15

> filestore_fiemap = true

> osd_max_backfills = 1

> osd_backfill_scan_min = 16

> osd_backfill_scan_max = 128

> osd_max_scrubs = 1

> osd_scrub_sleep = 1

> osd_scrub_chunk_min = 2

> osd_scrub_chunk_max = 16

> debug_osd = 0/0

> debug_filestore = 0/0

> debug_rbd = 0/0

> debug_rados = 0/0

> debug_journal = 0/0

> debug_journaler = 0/0

Can you try without all the debug_* lines and see what the log then yields?

It's crashing on something which isn't logged now.

Wido

>

> Thanks for all help.

>

> Özhan

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com