On 10-12-2015 16:03, Willem Jan Withagen wrote:
I have a failure in:
./unittest_erasure_code_shec_arguments
All tests befor this PASS. (other than rbd which is disabled to
the time being)
Which I traceback to code in ErasureCodeShec.cc
Line 218:
unsigned blocksize = (*chunks.begin()).second.length();
After a few iterations I get a "negative" blocksize, which causes
allocations further on to really thrash the system out of swap.
At first I expected it could be due to a Clang typecasting problem.
But after more debugging I found the following in
buffer.h
unsigned length() const {
#if 0
// DEBUG: verify _len
unsigned len = 0;
for (std::list<ptr>::const_iterator it = _buffers.begin();
it != _buffers.end();
it++) {
len += (*it).length();
}
assert(len == _len);
#endif
return _len;
}
Which suggests that debugging was needed at this point earlier in life.
If I enable this debug block, I do get the assert affected.
Now the next question is why? Given the debug snippet it needed
analyzing before.
And the derived question then is:
What is the easiest path to find out what is actually wrong here.
A further followup on this.
After some extensive debugging with gdb and watches, I've come to the
conclusion
That the location of _len is used by more that one part of the code...
The location gets alternately written during:
TestErasureCodeShec_arguments.cc:136
shec_table.insert(std::make_pair(table_key,table_value));
Old value = 63015016
New value = 4294954344
....
Old value = 4294954344
New value = 63015016
.....
To retain this value 4294954344, which is definitely not the length.
Because printing values on the Linux variant, it gives 32. Which sounds
much more
sensible....
So there a few possibilities that I can think of:
1) Clang gets it wrong
2) There is a mixup of different type of libs that make for different
offsets in
the bufferlist structs
3) the bufferlist code is has portability issues
4) the bufferlist code has errors that do no show with gcc
Most likely it will be either 2) or 3) ....
But other suggestions are welcome...
And since bufferlists are at the center of Ceph, better get things right.
So I'm going to go over the test/bufferlist.cc code and see what is in
there.
And/or extract a less convoluted example from
TestErasureCodeShec_arguments.cc
and see if it is in there as well.
--WjW
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html