It crashes with SimpleMessenger as well (ms_type = simple)
I've also tried with and without these two settings, but still crashes. bluestore cache size = 536870912 bluestore cache kv max = 268435456
When using SimpleMessenger, it tells me it is crashing (Segmentation Fault) in 'thread_name:ms_pipe_write'. This is common in all crashes under SimpleMessenger, just like 'msgr-worker-<n>' was common under AsyncMessenger.
The node I'm testing this on is running a 32bit kernel (4.12.5) and has 8GB ram (free -m).
Per 'ps aux', VSZ and RSS never get much above 1196392 and 544024 respectively. (One time they didn't get past 999536 and 329712 respectively.)
Also, under SimpleMessenger, gdb is reporting stack corruption in the back traces.
What other memory tuning options should I try?
On 2017-09-11 08:05, Gregory Farnum wrote:
You could try setting it to run with SimpleMessenger instead of AsyncMessenger -- the default changed across those releases. I imagine the root of the problem though is that with BlueStore the OSD is using a lot more memory than it used to and so we're overflowing the 32-bit address space...which means a more permanent solution might require turning down the memory tuning options. Sage has discussed those in various places.
Hi, Is anyone running Ceph Luminous (12.2.0) on 32bit Linux? Have you seen any problems? My setup has been 1 MON and 7 OSDs (no MDS, RGW, etc), all running Jewel (10.2.1), on 32bit, with no issues at all. I've upgraded everything to latest version of Jewel (10.2.9) and still no issues. Next I upgraded my MON to Luminous (12.2.0) and added MGR to it. Still no issues. Next I removed one node from the cluster, wiped it clean, upgraded it to Luminous (12.2.), and created a new BlueStore data area. Now this node crashes with segmentation fault usually within a few minutes of starting up. I've loaded symbols and used GDB to examine back traces. From what I can tell, the seg faults are happening randomly, and the stack is corrupted, so traces from GDB are unusable (even with all symbols installed for all packages on the system). However, in all cases, the seg fault is occuring in the 'msgr-worker-<n>' thread. My data is fine, just would like to get Ceph 12.2.0 running stably on this node, so I can upgrade the remaining nodes and switch everything over to BlueStore. Thanks, Dyweni _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
|