On Wed, Mar 25, 2015 at 4:28 PM, Niels de Vos <ndevos@xxxxxxxxxx> wrote: > Ai, top posting, this makes it really difficult to follow the email if > you have not read the first parts :-/ Please remember to inline or > bottom post when replying. > > On Wed, Mar 25, 2015 at 03:21:28PM +0530, Venky Shankar wrote: >> looks like the iobref (and the iobuf) was allocated in protocol/server.. >> >> (gdb) x/16x (ie->ie_iobref->iobrefs - 8) >> 0xbb11a438: 0xbb18ba80 0x00000001 0x00000068 0x00000040 >> 0xbb11a448: 0xbb1e2018 0xcafebabe 0x00000000 0x00000000 >> 0xbb11a458: 0x00000003 0x00000003 0x00000008 0x00000003 >> 0xbb11a468: 0x0000000c 0x00000003 0x0000000e 0x00000003 >> >> 8 bytes before the magic header (0xcafebabe) lives the xlator ("this") >> that invoked GF_MALLOC. Here it's: >> >> (gdb) p *(xlator_t *)0xbb1e2018 >> $9 = {name = 0xbb1dbb08 "patchy-server", type = 0xbb1dbb38 >> "protocol/server", next = 0xbb1e1018, prev = 0x0, parents = 0x0, >> children = 0xbb1dbbc8, options = 0xbb18a028, dlhandle = 0xb9b7d000, >> fops = 0xb9adf0e0 <fops>, cbks = 0xb9adc8cc <cbks>, >> dumpops = 0xb9ade460 <dumpops>, volume_options = {next = 0xbb1dbb68, >> prev = 0xbb1dbbf8}, fini = 0xb9ab539d <fini>, >> init = 0xb9ab48a5 <init>, reconfigure = 0xb9ab418c <reconfigure>, >> mem_acct_init = 0xb9ab3cb1 <mem_acct_init>, >> notify = 0xb9ab53a3 <notify>, loglevel = GF_LOG_NONE, latencies = >> {{min = 0, max = 0, total = 0, std = 0, mean = 0, >> count = 0} <repeats 50 times>}, history = 0x0, ctx = 0xbb109000, >> graph = 0xbb1c30f8, itable = 0x0, >> init_succeeded = 1 '\001', private = 0xbb1e3018, mem_acct = >> {num_types = 144, rec = 0xbb1c6000}, winds = 0, >> switched = 0 '\000', local_pool = 0x0, is_autoloaded = _gf_false} >> >> looking into it more. if the above strikes a bell to someone, let us know. > > Going by the output from gdb above and the below layout: > > $ printf 'type=%d\nsize=%d\n' 0x00000068 0x00000040 > type=104 > size=64 > > This means that the protocol/server did a GF_?ALLOC(64, 104). The 104 is > an enum for the mem-type and libglusterfs/src/mem-types.h points to > gf_common_mt_iobrefs. There is only one function that uses > gf_common_mt_iobrefs, which is iobref_new(). > > protocol/server calls iobref_new() only once directly (there could be > some other indirect calls too) in server_submit_reply(). yes, that's the only place in protocol/server than calls iobref_new(). > > I do not quickly see how the issue can happen with the analyzed data in > this email. Possibly an allocation before (memory address wise) this > went awry and caused the wreckage. We may need to follow these > diagnostic steps back upwards and try to find the first occurrence where > 0xcafebabe is followed by 0xcafebabe instead of 0xbaadf00d. What's interesting is the number of used iobufs is zero but ->iobrefs points to a memory address (iobref_unref() iterates ->alloced times and frees anything which isn't NULL). There's someone who put it there. (gdb) p *ie->ie_iobref $1 = {lock = {pts_magic = 2004287495, pts_spin = 0 '\000', pts_flags = 0}, ref = 1, iobrefs = 0xbb11a458, alloced = 16, used = 0} Emmanuel, Could I run some tests on nbslave70 (I plan to disable some translators). Just running AFR test cases should trigger the segfault, correct? > > That's the only idea I have for now, but I'll keep thinking of something > that could make this easier. > > Note: the iobref structure is used really a lot, this makes it a likely > structure to blow away other structures when something else frees some > memory, but wants to use it afterwards. I think a use-after-free could > be one cause for this. > > Niels > >> >> -venky >> >> On Tue, Mar 24, 2015 at 11:28 PM, Niels de Vos <ndevos@xxxxxxxxxx> wrote: >> > On Tue, Mar 24, 2015 at 05:18:44PM +0000, Emmanuel Dreyfus wrote: >> >> Hi >> >> >> >> The merge of http://review.gluster.org/9953/ removed a few crashes from >> >> NetBSD regression tests, but the thing remains uterly broken since the >> >> merge of http://review.gluster.org/9708/ though I cannot tell if I have >> >> bugs leftover form this commit or if I face new problems. >> >> >> >> Here are the known problem so far: >> > >> > ...snip! I'll only give some info to your 2nd point. >> > >> >> 2) I still experience memory corruption, which usually crash glsuterfsd >> >> because some pointer waas replaced by value 0x3. This strikes on iobref >> >> most of the time, but it can happens elsewhere. >> >> >> >> I would be glad if someone could help here. On nbslave70:/autobuild I >> >> added code to check for iobref/iobuf sanity at random place (by calling >> >> iobref_sanity()). I do this in synask_wrap and in STACK_WIND/UNWIND, >> >> but I have not been able to spot the source of the problem yet. >> >> >> >> The weird thing is that memory seems to always be overwritten by the >> >> same values, and magic 0xcafebabe number before the buffer is preserved. >> >> Here is an example: where iobref->iobrefs = 0xbb11a458 >> >> 0xbb11a44c: 0xcafebabe 0x00000000 0x00000000 0x00000003 >> >> 0xbb11a45c: 0x00000003 0x00000008 0x00000003 0x0000000c >> >> 0xbb11a46c: 0x00000003 0x0000000e 0x00000003 0x00000010 >> >> 0xbb11a47c: 0x00000003 0x00000009 0x00000003 0x0000000d >> >> 0xbb11a48c: 0x00000003 0x00000015 0x00000003 0x00000016 >> >> 0xbb11a49c: 0x00000003 0x00000032 0x00000034 0xbb1e2018 >> >> 0xbb11a4ac: 0xcafebabe 0x00000000 0x00000000 0xbb11a5d8 >> > >> > Recently I was looking into something that involved some more >> > understanding of GF_MALLOC(). I did not really continue with it becase >> > other things got a higher priority. But, maybe this layout helps you a >> > little: >> > >> > : : >> > : : >> > +----------------------+ >> > | GF_MEM_TRAILER_MAGIC | >> > +----------------------+ >> > | | >> > | ... | >> > | | >> > +----------------------+ >> > | 8 bytes | >> > +----------------------+ >> > | GF_MEM_HEADER_MAGIC | >> > +----------------------+ >> > | *xlator_t | >> > +----------------------+ >> > | size | >> > +----------------------+ >> > | type | >> > +----------------------+ >> > : : >> > : : >> > >> > #define GF_MEM_HEADER_MAGIC 0xCAFEBABE >> > #define GF_MEM_TRAILER_MAGIC 0xBAADF00D >> > >> > >> > Because there is no 0xbaadfood in your memory dump, I would assume that >> > the memory has just been allocated, and the 0xcafebabe at 0xbb11a4ac is >> > a left over from a previous allocation. >> > >> > You could try to run a test with more strict memory enforcing. All the >> > GF_ASSERT() calls will actually call abort() in that case, and it may >> > make things a little easier to debug. You would pass --enable-debug to >> > the configure commandline: >> > >> > $ ./configure --enable-debug >> > >> > I hope that we will be able to setup scheduled automated regression >> > tests with --enable-debug build binaries. It may be helpful to catch >> > unintended NULL usage a little earlier. >> > >> > HTH, >> > Niels >> > _______________________________________________ >> > Gluster-devel mailing list >> > Gluster-devel@xxxxxxxxxxx >> > http://www.gluster.org/mailman/listinfo/gluster-devel _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel