Re: Ceph-osd Daemon Receives Segmentation Fault on Trusty After Upgrading to 0.94.10 Release

Özhan Rüzgar Karaman <oruzgarkaraman@xxxxxxxxx> · Tue, 21 Mar 2017 16:00:20 +0300

Hi Alexey;1 year ago, i read from an OpenStack/Ceph tuning document and when you set this parameter to true you could get better performance for block level storage. 

Is it completely safe to directly change it from true to false and restart Ceph deamons in order.

Thanks for all your support.

Özhan

On Tue, Mar 21, 2017 at 3:27 PM, Alexey Sheplyakov <asheplyakov@xxxxxxxxxxxx> wrote:
Hi,

This looks like a bug [1]. You can work it around by disabling the

fiemap feature, like this:

[osd]

filestore fiemap = false

Fiemap should have been disabled by default, perhaps you've explicitly

enabled it?

[1] http://tracker.ceph.com/issues/19323

Best regards,

      Alexey

On Tue, Mar 21, 2017 at 12:21 PM, Özhan Rüzgar Karaman

<oruzgarkaraman@xxxxxxxxx> wrote:

> Hi Wido;

> After 30 minutes osd id 3 crashed also with segmentation fault, i uploaded

> logs again to the same location as ceph.log.wido.20170321-3.tgz. So now all

> OSD deamons on that server is crashed.

>

> Thanks

> Özhan

>

> On Tue, Mar 21, 2017 at 10:57 AM, Özhan Rüzgar Karaman

> <oruzgarkaraman@xxxxxxxxx> wrote:

>>

>> Hi Wido;

>> At weekend i roll back all servers to 0.94.9-1 version and all worked fine

>> with old release.

>>

>> Today i upgraded all monitor servers and 1 osd server to 0.94.10-1

>> version. All OSD servers has 2 osds. I update the ceph.conf on the osd

>> server removed debug lines and restart osd daemons.

>>

>> This time osd id 3 started and operated successfully but osd id 2 failed

>> again with same segmentation fault.

>>

>> I have uploaded new logs as to the same destination as

>> ceph.log.wido.20170321-2.tgz and its link is below again.

>>

>>

>> https://drive.google.com/drive/folders/0B_hD9LJqrkd7NmtJOW5YUnh6UE0?usp=sharing

>>

>> Thanks for all your help.

>>

>> Özhan

>>

>>

>> On Sun, Mar 19, 2017 at 8:47 PM, Wido den Hollander <wido@xxxxxxxx> wrote:

>>>

>>>

>>> > Op 17 maart 2017 om 8:39 schreef Özhan Rüzgar Karaman

>>> > <oruzgarkaraman@xxxxxxxxx>:

>>> >

>>> >

>>> > Hi;

>>> > Yesterday i started to upgrade my Ceph environment from 0.94.9 to

>>> > 0.94.10.

>>> > All monitor servers upgraded successfully but i experience problems on

>>> > starting upgraded OSD daemons.

>>> >

>>> > When i try to start an Ceph OSD Daemon(/usr/bin/ceph-osd) receives

>>> > Segmentation Fault and it kills after 2-3 minutes. To clarify the issue

>>> > i

>>> > have role backed Ceph packages on that OSD Server  back to 0.94.9 and

>>> > problematic servers could rejoin to the 0.94.10 cluster.

>>> >

>>> > My environment is standard 14.04.5 Ubuntu Trusty server with 4.4.x

>>> > kernel

>>> > and i am using standard packages from http://eu.ceph.com/debian-hammer

>>> > nothing special on my environment.

>>> >

>>> > I have uploaded the Ceph OSD Logs to the link below.

>>> >

>>> >

>>> > https://drive.google.com/drive/folders/0B_hD9LJqrkd7NmtJOW5YUnh6UE0?usp=sharing

>>> >

>>> > And my ceph.conf is below

>>> >

>>> > [global]

>>> > fsid = a3742d34-9b51-4a36-bf56-4defb62b2b8e

>>> > mon_initial_members = mont1, mont2, mont3

>>> > mon_host = 172.16.51.101,172.16.51.102,172.16.51.103

>>> > auth_cluster_required = cephx

>>> > auth_service_required = cephx

>>> > auth_client_required = cephx

>>> > filestore_xattr_use_omap = true

>>> > public_network = 172.16.51.0/24

>>> > cluster_network = 172.16.51.0/24

>>> > debug_ms = 0/0

>>> > debug_auth = 0/0

>>> >

>>> > [mon]

>>> > mon_allow_pool_delete = false

>>> > mon_osd_down_out_interval = 300

>>> > osd_pool_default_flag_nodelete = true

>>> >

>>> > [osd]

>>> > filestore_max_sync_interval = 15

>>> > filestore_fiemap = true

>>> > osd_max_backfills = 1

>>> > osd_backfill_scan_min = 16

>>> > osd_backfill_scan_max = 128

>>> > osd_max_scrubs = 1

>>> > osd_scrub_sleep = 1

>>> > osd_scrub_chunk_min = 2

>>> > osd_scrub_chunk_max = 16

>>> > debug_osd = 0/0

>>> > debug_filestore = 0/0

>>> > debug_rbd = 0/0

>>> > debug_rados = 0/0

>>> > debug_journal = 0/0

>>> > debug_journaler = 0/0

>>>

>>> Can you try without all the debug_* lines and see what the log then

>>> yields?

>>>

>>> It's crashing on something which isn't logged now.

>>>

>>> Wido

>>>

>>> >

>>> > Thanks for all help.

>>> >

>>> > Özhan

>>> > _______________________________________________

>>> > ceph-users mailing list

>>> > ceph-users@xxxxxxxxxxxxxx

>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>

>>

>

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com