On 03/09/2018 12:49 AM, Brad Hubbard wrote: > On Fri, Mar 9, 2018 at 3:54 AM, Subhachandra Chandra > <schandra@xxxxxxxxxxxx> wrote: >> I noticed a similar crash too. Unfortunately, I did not get much info in the >> logs. >> >> *** Caught signal (Segmentation fault) ** >> >> Mar 07 17:58:26 data7 ceph-osd-run.sh[796380]: in thread 7f63a0a97700 >> thread_name:safe_timer >> >> Mar 07 17:58:28 data7 ceph-osd-run.sh[796380]: docker_exec.sh: line 56: >> 797138 Segmentation fault (core dumped) "$@" > > The log isn't very helpful AFAICT. Are these both container > environments? If so, what are the details (OS, etc.). In my case (reported in the OP) it is not a container. I'm running - ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) - CentOS 7.4 (fully updated on 03.02.2017) - Spectre and Meltdown workarrounds disabled (kerrnel options: noibrs noibpb nopti) 3x MON/MDS hosts (128GB RAM) 10x OSD hosts 22 HDD + 2 SDD osds + 2 NVME for wal/db each (128GB) ceph is using bluestore wal and db are separated on NVME devices (1GB wal, 64GB db) 3 pools: 1: 3 x replicated (all SSD osds): data 2: 3 x replicated (all SSD osds): metadata pool for EC pool 3: 6+3 EC pool (all HDD) -> metadata on pool 2 pools are used for cephfs only # ceph fs ls name: cephfs, metadata pool: ssd-rep-metadata-pool, data pools: [hdd-ec-data-pool ssd-rep-data-pool ] > > Can anyone capture a core file? Please feel free to open a tracker on this. I've no core file avilable, was not dumped, and so far I've noticed just that single segfault. Dietmar > >> >> >> Thanks >> >> Subhachandra >> >> >> >> On Thu, Mar 8, 2018 at 6:00 AM, Dietmar Rieder <dietmar.rieder@xxxxxxxxxxx> >> wrote: >>> >>> Hi, >>> >>> I noticed in my client (using cephfs) logs that an osd was unexpectedly >>> going down. >>> While checking the osd logs for the affected OSD I found that the osd >>> was seg faulting: >>> >>> [....] >>> 2018-03-07 06:01:28.873049 7fd9af370700 -1 *** Caught signal >>> (Segmentation fault) ** >>> in thread 7fd9af370700 thread_name:safe_timer >>> >>> ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) >>> luminous (stable) >>> 1: (()+0xa3c611) [0x564585904611] >>> 2: (()+0xf5e0) [0x7fd9b66305e0] >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>> needed to interpret this. >>> [...] >>> >>> Should I open a ticket for this? What additional information is needed? >>> >>> >>> I put the relevant log entries for download under [1], so maybe someone >>> with more >>> experience can find some useful information therein. >>> >>> Thanks >>> Dietmar >>> >>> >>> [1] https://expirebox.com/download/6473c34c80e8142e22032469a59df555.html >>> >>> -- >>> _________________________________________ >>> D i e t m a r R i e d e r, Mag.Dr. >>> Innsbruck Medical University >>> Biocenter - Division for Bioinformatics >>> Email: dietmar.rieder@xxxxxxxxxxx >>> Web: http://www.icbi.at >>> >>> >>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > -- _________________________________________ D i e t m a r R i e d e r, Mag.Dr. Innsbruck Medical University Biocenter - Division for Bioinformatics Email: dietmar.rieder@xxxxxxxxxxx Web: http://www.icbi.at _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com