Jewel 10.2.2 - Error when flushing journal

Mehmet <ceph@xxxxxxxxxx> · Wed, 07 Sep 2016 13:23:57 +0200

Hello ceph people,

yesterday i stopped one of my OSDs via

root@:~# systemctl stop ceph-osd@10

and tried to flush the journal for this osd via

root@:~# ceph-osd -i 10 --flush-journal

but getting this output on the screen:

SG_IO: questionable sense data, results may be incorrect
SG_IO: questionable sense data, results may be incorrect
*** Caught signal (Segmentation fault) **
 in thread 7fd846333700 thread_name:ceph-osd
 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (()+0x96bdde) [0x55f33b862dde]
 2: (()+0x113d0) [0x7fd84b6143d0]
 3: [0x55f345bbff80]
2016-09-06 22:12:51.850739 7fd846333700 -1 *** Caught signal 
(Segmentation fault) **
 in thread 7fd846333700 thread_name:ceph-osd

 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (()+0x96bdde) [0x55f33b862dde]
 2: (()+0x113d0) [0x7fd84b6143d0]
 3: [0x55f345bbff80]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
needed to interpret this.

     0> 2016-09-06 22:12:51.850739 7fd846333700 -1 *** Caught signal 
(Segmentation fault) **
 in thread 7fd846333700 thread_name:ceph-osd

 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (()+0x96bdde) [0x55f33b862dde]
 2: (()+0x113d0) [0x7fd84b6143d0]
 3: [0x55f345bbff80]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
needed to interpret this.

Segmentation fault

This is the logfile from my osd.10 with further informations
- http://slexy.org/view/s21tfwQ1fZ

Today i stopped another OSD (osd.11)

root@:~# systemctl stop ceph-osd@11

I did not not get the above mentioned error - but this

root@:~# ceph-osd -i 11 --flush-journal
SG_IO: questionable sense data, results may be incorrect
SG_IO: questionable sense data, results may be incorrect
2016-09-07 13:19:39.729894 7f3601a298c0 -1 flushed journal 
/var/lib/ceph/osd/ceph-11/journal for object store 
/var/lib/ceph/osd/ceph-11

This is the logfile from my osd.11 with further informations
- http://slexy.org/view/s2AlEhV38m

This is not realy a case actualy cause i will setup the journal 
partitions again with 20GB (from 5GB actual) an bring the OSD then bring 
up again.
But i thought i should mail this error to the mailing list.

This is my Setup:

*Software/OS*
- Jewel
#> ceph tell osd.* version | grep version | uniq
"version": "ceph version 10.2.2 
(45107e21c568dd033c2f0a3107dec8f0b0e58374)"

#> ceph tell mon.* version
[...] ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

- Ubuntu 16.04 LTS on all OSD and MON Server
#> uname -a
31.08.2016: Linux reilif 4.4.0-36-generic #55-Ubuntu SMP Thu Aug 11 
18:01:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

*Server*
3x OSD Server, each with

- 2x Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz ==> 12 Cores, no 
Hyper-Threading

- 64GB RAM
- 10x 4TB HGST 7K4000 SAS2 (6GB/s) Disks as OSDs

- 1x INTEL SSDPEDMD400G4 (Intel DC P3700 NVMe) as Journaling Device for 
10-12 Disks

- 1x Samsung SSD 840/850 Pro only for the OS

3x MON Server
- Two of them with 1x Intel(R) Xeon(R) CPU E3-1265L V2 @ 2.50GHz (4 
Cores, 8 Threads) - The third one has 2x Intel(R) Xeon(R) CPU L5430 @ 
2.66GHz ==> 8 Cores, no Hyper-Threading

- 32 GB RAM
- 1x Raid 10 (4 Disks)

*Network*
- Actualy each Server and Client has on active connection @ 1x 1GB; In 
Short this will be changed to 2x 10GB Fibre perhaps with LACP when 
possible.

- We do not use Jumbo Frames yet..

- Public and Cluster-Network related Ceph traffic is actualy going 
through this one active 1GB Interface on each Server.

hf
- Mehmet
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com