Hi,
I have this problem that testing ceph_erasure_code sometimes crashes in:
ceph_erasure_code --debug-osd 20 --plugin_exists jerasure
If I just run this in a while loop on the command line then it crashes
only once every few hundert runs.
Running it in the testset it crashes just about every time.
The crash from the core is an invalid point in the assertion code I
added to log/Entry.h
#7 0x000000000077984d in ceph::log::Entry::hint_size (this=0x80405cf00)
at log/Entry.h:70
70 assert( *m_exp_len != -1 );
(gdb) l
65 }
66
67 // function improves estimate for expected size of message
68 void hint_size() {
69 if (m_exp_len != NULL) {
70 assert( *m_exp_len != -1 );
71 assert( 0 <= *m_exp_len );
72 assert( *m_exp_len <= 100000 );
73 size_t size = m_streambuf.size();
74 if (size > __atomic_load_n(m_exp_len, __ATOMIC_RELAXED)) {
(gdb) p m_exp_len
$1 = (size_t *) 0x8045ec5c0
(gdb) p &m_exp_len
$2 = (size_t **) 0x80405cf90
the address in m_exp_len 0x8045ec5c0 is outside of the heap, and gives
an illegal access.
And thus the program gets a SIGSEGV
Now my problem is that I can run this under gdb and watch the memory.
But that rarely goes wrong.
Running it from 'make recheck' goes wrong just about every time but it
will be hard to run that
trhu gdb and actually catch the code that is writting the illegal
address into m_exp_len.
Does anybody have suggestions as how to track/debug this?
Thanx,
--WjW
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html