Re: Ceph 12.2.0 on 32bit?

Dyweni - Ceph-Users <6EXbab4FYk8H@xxxxxxxxxx> · Fri, 22 Sep 2017 22:47:45 -0500

It crashes with SimpleMessenger as well  (ms_type = simple)

I've also tried with and without these two settings, but still crashes.
bluestore cache size = 536870912
bluestore cache kv max = 268435456

When using SimpleMessenger, it tells me it is crashing (Segmentation Fault) in 'thread_name:ms_pipe_write'.  This is common in all crashes under SimpleMessenger, just like 'msgr-worker-<n>' was common under AsyncMessenger.

The node I'm testing this on is running a 32bit kernel (4.12.5) and has 8GB ram (free -m).  

Per 'ps aux', VSZ and RSS never get much above 1196392 and 544024 respectively.  (One time they didn't get past 999536 and 329712 respectively.)

Also, under SimpleMessenger, gdb is reporting stack corruption in the back traces.

What other memory tuning options should I try?

On 2017-09-11 08:05, Gregory Farnum wrote:

You could try setting it to run with SimpleMessenger instead of AsyncMessenger -- the default changed across those releases.
I imagine the root of the problem though is that with BlueStore the OSD is using a lot more memory than it used to and so we're overflowing the 32-bit address space...which means a more permanent solution might require turning down the memory tuning options. Sage has discussed those in various places.

On Sun, Sep 10, 2017 at 11:52 PM Dyweni - Ceph-Users <6EXbab4FYk8H@xxxxxxxxxx> wrote:
Hi,

 Is anyone running Ceph Luminous (12.2.0) on 32bit Linux?  Have you seen
 any problems?

 My setup has been 1 MON and 7 OSDs (no MDS, RGW, etc), all running Jewel
 (10.2.1), on 32bit, with no issues at all.

 I've upgraded everything to latest version of Jewel (10.2.9) and still
 no issues.

 Next I upgraded my MON to Luminous (12.2.0) and added MGR to it.  Still
 no issues.

 Next I removed one node from the cluster, wiped it clean, upgraded it to
 Luminous (12.2.), and created a new BlueStore data area.  Now this node
 crashes with segmentation fault usually within a few minutes of starting
 up.  I've loaded symbols and used GDB to examine back traces.  From what
 I can tell, the seg faults are happening randomly, and the stack is
 corrupted, so traces from GDB are unusable (even with all symbols
 installed for all packages on the system). However, in all cases, the
 seg fault is occuring in the 'msgr-worker-<n>' thread.

 My data is fine, just would like to get Ceph 12.2.0 running stably on
 this node, so I can upgrade the remaining nodes and switch everything
 over to BlueStore.

 Thanks,
 Dyweni
 _______________________________________________
 ceph-users mailing list
 ceph-users@xxxxxxxxxxxxxx
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com