Ai, top posting, this makes it really difficult to follow the email if you have not read the first parts :-/ Please remember to inline or bottom post when replying. On Wed, Mar 25, 2015 at 03:21:28PM +0530, Venky Shankar wrote: > looks like the iobref (and the iobuf) was allocated in protocol/server.. > > (gdb) x/16x (ie->ie_iobref->iobrefs - 8) > 0xbb11a438: 0xbb18ba80 0x00000001 0x00000068 0x00000040 > 0xbb11a448: 0xbb1e2018 0xcafebabe 0x00000000 0x00000000 > 0xbb11a458: 0x00000003 0x00000003 0x00000008 0x00000003 > 0xbb11a468: 0x0000000c 0x00000003 0x0000000e 0x00000003 > > 8 bytes before the magic header (0xcafebabe) lives the xlator ("this") > that invoked GF_MALLOC. Here it's: > > (gdb) p *(xlator_t *)0xbb1e2018 > $9 = {name = 0xbb1dbb08 "patchy-server", type = 0xbb1dbb38 > "protocol/server", next = 0xbb1e1018, prev = 0x0, parents = 0x0, > children = 0xbb1dbbc8, options = 0xbb18a028, dlhandle = 0xb9b7d000, > fops = 0xb9adf0e0 <fops>, cbks = 0xb9adc8cc <cbks>, > dumpops = 0xb9ade460 <dumpops>, volume_options = {next = 0xbb1dbb68, > prev = 0xbb1dbbf8}, fini = 0xb9ab539d <fini>, > init = 0xb9ab48a5 <init>, reconfigure = 0xb9ab418c <reconfigure>, > mem_acct_init = 0xb9ab3cb1 <mem_acct_init>, > notify = 0xb9ab53a3 <notify>, loglevel = GF_LOG_NONE, latencies = > {{min = 0, max = 0, total = 0, std = 0, mean = 0, > count = 0} <repeats 50 times>}, history = 0x0, ctx = 0xbb109000, > graph = 0xbb1c30f8, itable = 0x0, > init_succeeded = 1 '\001', private = 0xbb1e3018, mem_acct = > {num_types = 144, rec = 0xbb1c6000}, winds = 0, > switched = 0 '\000', local_pool = 0x0, is_autoloaded = _gf_false} > > looking into it more. if the above strikes a bell to someone, let us know. Going by the output from gdb above and the below layout: $ printf 'type=%d\nsize=%d\n' 0x00000068 0x00000040 type=104 size=64 This means that the protocol/server did a GF_?ALLOC(64, 104). The 104 is an enum for the mem-type and libglusterfs/src/mem-types.h points to gf_common_mt_iobrefs. There is only one function that uses gf_common_mt_iobrefs, which is iobref_new(). protocol/server calls iobref_new() only once directly (there could be some other indirect calls too) in server_submit_reply(). I do not quickly see how the issue can happen with the analyzed data in this email. Possibly an allocation before (memory address wise) this went awry and caused the wreckage. We may need to follow these diagnostic steps back upwards and try to find the first occurrence where 0xcafebabe is followed by 0xcafebabe instead of 0xbaadf00d. That's the only idea I have for now, but I'll keep thinking of something that could make this easier. Note: the iobref structure is used really a lot, this makes it a likely structure to blow away other structures when something else frees some memory, but wants to use it afterwards. I think a use-after-free could be one cause for this. Niels > > -venky > > On Tue, Mar 24, 2015 at 11:28 PM, Niels de Vos <ndevos@xxxxxxxxxx> wrote: > > On Tue, Mar 24, 2015 at 05:18:44PM +0000, Emmanuel Dreyfus wrote: > >> Hi > >> > >> The merge of http://review.gluster.org/9953/ removed a few crashes from > >> NetBSD regression tests, but the thing remains uterly broken since the > >> merge of http://review.gluster.org/9708/ though I cannot tell if I have > >> bugs leftover form this commit or if I face new problems. > >> > >> Here are the known problem so far: > > > > ...snip! I'll only give some info to your 2nd point. > > > >> 2) I still experience memory corruption, which usually crash glsuterfsd > >> because some pointer waas replaced by value 0x3. This strikes on iobref > >> most of the time, but it can happens elsewhere. > >> > >> I would be glad if someone could help here. On nbslave70:/autobuild I > >> added code to check for iobref/iobuf sanity at random place (by calling > >> iobref_sanity()). I do this in synask_wrap and in STACK_WIND/UNWIND, > >> but I have not been able to spot the source of the problem yet. > >> > >> The weird thing is that memory seems to always be overwritten by the > >> same values, and magic 0xcafebabe number before the buffer is preserved. > >> Here is an example: where iobref->iobrefs = 0xbb11a458 > >> 0xbb11a44c: 0xcafebabe 0x00000000 0x00000000 0x00000003 > >> 0xbb11a45c: 0x00000003 0x00000008 0x00000003 0x0000000c > >> 0xbb11a46c: 0x00000003 0x0000000e 0x00000003 0x00000010 > >> 0xbb11a47c: 0x00000003 0x00000009 0x00000003 0x0000000d > >> 0xbb11a48c: 0x00000003 0x00000015 0x00000003 0x00000016 > >> 0xbb11a49c: 0x00000003 0x00000032 0x00000034 0xbb1e2018 > >> 0xbb11a4ac: 0xcafebabe 0x00000000 0x00000000 0xbb11a5d8 > > > > Recently I was looking into something that involved some more > > understanding of GF_MALLOC(). I did not really continue with it becase > > other things got a higher priority. But, maybe this layout helps you a > > little: > > > > : : > > : : > > +----------------------+ > > | GF_MEM_TRAILER_MAGIC | > > +----------------------+ > > | | > > | ... | > > | | > > +----------------------+ > > | 8 bytes | > > +----------------------+ > > | GF_MEM_HEADER_MAGIC | > > +----------------------+ > > | *xlator_t | > > +----------------------+ > > | size | > > +----------------------+ > > | type | > > +----------------------+ > > : : > > : : > > > > #define GF_MEM_HEADER_MAGIC 0xCAFEBABE > > #define GF_MEM_TRAILER_MAGIC 0xBAADF00D > > > > > > Because there is no 0xbaadfood in your memory dump, I would assume that > > the memory has just been allocated, and the 0xcafebabe at 0xbb11a4ac is > > a left over from a previous allocation. > > > > You could try to run a test with more strict memory enforcing. All the > > GF_ASSERT() calls will actually call abort() in that case, and it may > > make things a little easier to debug. You would pass --enable-debug to > > the configure commandline: > > > > $ ./configure --enable-debug > > > > I hope that we will be able to setup scheduled automated regression > > tests with --enable-debug build binaries. It may be helpful to catch > > unintended NULL usage a little earlier. > > > > HTH, > > Niels > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel@xxxxxxxxxxx > > http://www.gluster.org/mailman/listinfo/gluster-devel _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel