On Thu, 2019-04-18 at 16:59 -0400, Dave Anderson wrote: > > ----- Original Message ----- > > On Thu, 2019-04-18 at 14:05 -0400, Dave Anderson wrote: > > > > > > ----- Original Message ----- > > > > On Thu, 2019-04-18 at 11:09 -0400, Dave Anderson wrote: > > > > > > > > > > ----- Original Message ----- > > > > > > On Thu, 2019-04-18 at 15:02 +0100, Pierguido Lambri wrote: > > > > > > > Hello, > > > > > > > > > > > > > > Today while I was looking into a vmcore, I got suddenly > > > > > > > the > > > > > > > message > > > > > > > in $SUBJECT. > > > > > > > It started after I did a search into the process stack > > > > > > > pages > > > > > > > (search > > > > > > > -t) > > > > > > > and for each command I run afterwards I kept getting that > > > > > > > message. > > > > > > > For example: > > > > > > > > > > > > > > $ retrace-server-interact 603967269 crash > > > > > > > ... > > > > > > > crash> search -t ffff88040a0d5280 > > > > > > > > > > > > > > search: invalid list entry: 0 > > > > > > > > > > > > > > search: invalid list entry: 0 > > > > > > > > > > > > > > search: invalid list entry: 0 > > > > > > > PID: 606 TASK: ffff88082d226eb0 CPU: 5 COMMAND: > > > > > > > "xfsaild/dm-0" > > > > > > > ffff88083ff5b948: ffff88040a0d5280 > > > > > > > ffff88083ff5b990: ffff88040a0d5280 > > > > > > > ffff88083ff5baa8: ffff88040a0d5280 > > > > > > > ffff88083ff5baf0: ffff88040a0d5280 > > > > > > > ffff88083ff5bcf0: ffff88040a0d5280 > > > > > > > ffff88083ff5bd38: ffff88040a0d5280 > > > > > > > ffff88083ff5bd98: ffff88040a0d5280 > > > > > > > > > > > > > > > > > > > > > WARNING: malloc/free mismatch (29/32) > > > > > > > > > > > > > > crash> ps -m | grep UN > > > > > > > [ 0 00:00:00.146] [UN] PID: 1811 TASK: > > > > > > > ffff880c17bd1fa0 CPU: > > > > > > > 1 COMMAND: "cp" > > > > > > > WARNING: malloc/free mismatch (29/32) > > > > > > > > > > > > > > I guess this comes from a possible corrupted vmcore (I > > > > > > > just > > > > > > > got > > > > > > > it > > > > > > > from this vmcore), > > > > > > > but I wonder why every new command keeps returning the > > > > > > > same > > > > > > > message. > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Pier > > > > > > > > > > > > > > -- > > > > > > > Crash-utility mailing list > > > > > > > Crash-utility@xxxxxxxxxx > > > > > > > https://www.redhat.com/mailman/listinfo/crash-utility > > > > > > > > > > > > FWIW, I just pulled this up after plambri pinged me. This > > > > > > is > > > > > > the > > > > > > backtrace that is being hit though I've not dug in more: > > > > > > > > > > > > Breakpoint 3, do_list (ld=0x7ffffffea6c0) at tools.c:3820 > > > > > > 3820 error(INFO, > > > > > > "\ninvalid > > > > > > list > > > > > > entry: > > > > > > 0\n"); > > > > > > (gdb) list > > > > > > 3815 return -1; > > > > > > 3816 } > > > > > > 3817 > > > > > > 3818 if (next == 0) { > > > > > > 3819 if (ld->flags & > > > > > > LIST_HEAD_FORMAT) { > > > > > > 3820 error(INFO, > > > > > > "\ninvalid > > > > > > list > > > > > > entry: > > > > > > 0\n"); > > > > > > 3821 if > > > > > > (close_hq_on_return) > > > > > > 3822 hq_close(); > > > > > > 3823 return -1; > > > > > > 3824 } > > > > > > (gdb) bt > > > > > > #0 do_list (ld=0x7ffffffea6c0) at tools.c:3820 > > > > > > #1 0x000000000047ec82 in dump_vmap_area > > > > > > (vi=0x7ffffffed0d0) at > > > > > > memory.c:8724 > > > > > > #2 dump_vmlist (vi=0x7ffffffed0d0) at memory.c:8590 > > > > > > #3 0x000000000047f3eb in last_vmalloc_address () at > > > > > > memory.c:16792 > > > > > > #4 0x0000000000515e6b in x86_64_get_kvaddr_ranges > > > > > > (vrp=0x7fffffffd340) at x86_64.c:8706 > > > > > > #5 0x000000000049c6ae in cmd_search () at memory.c:13988 > > > > > > #6 0x0000000000465f9c in exec_command () at main.c:879 > > > > > > #7 0x00000000004661ca in main_loop () at main.c:826 > > > > > > #8 0x00000000006b21a3 in captured_command_loop > > > > > > (data=<value > > > > > > optimized out>) t main.c:258 > > > > > > #9 0x00000000006b0a8b in catch_errors (func=0x6b2190 > > > > > > <captured_command_loop>, func_args=0x0, errstring=0x90c106 > > > > > > "", > > > > > > mask=6) at exceptions.c:557 > > > > > > #10 0x00000000006b3076 in captured_main (data=<value > > > > > > optimized > > > > > > out>) at main.c:1064 > > > > > > #11 0x00000000006b0a8b in catch_errors (func=0x6b22b0 > > > > > > <captured_main>, func_args=0x7fffffffe2e0, > > > > > > errstring=0x90c106 > > > > > > "", > > > > > > mask=6) at exceptions.c:557 > > > > > > #12 0x00000000006b1fa4 in gdb_main (args=<value optimized > > > > > > out>) > > > > > > at > > > > > > main.c:1079 > > > > > > #13 0x00000000006b1fde in gdb_main_entry (argc=<value > > > > > > optimized > > > > > > out>, argv=<value optimized out>) at main.c:1099 > > > > > > #14 0x0000000000467030 in main (argc=3, > > > > > > argv=0x7fffffffe458) at > > > > > > main.c:707 > > > > > > > > > > Hmmm, the vmap_area list is a list_head type list, so there > > > > > should > > > > > never be > > > > > a NULL "next" pointer. > > > > > > > > > > I'm guessing that "kmem -v" also fails? The last vmap_area > > > > > entry > > > > > should point back to > > > > > the global "vmap_area_list" list header, for example: > > > > > > > > > > crash> kmem -v | tail > > > > > ffff96e7ecaaca80 ffff96e54c89c400 ffffffffc0e54000 - > > > > > ffffffffc0e5a000 24576 > > > > > ffff96e757ffe380 ffff96e4be98f3c0 ffffffffc0e5d000 - > > > > > ffffffffc0e6d000 65536 > > > > > ffff96e467b33400 ffff96e6a3ae1a00 ffffffffc0e6d000 - > > > > > ffffffffc0e73000 24576 > > > > > ffff96e85cf4e600 ffff96e752c52b40 ffffffffc0e77000 - > > > > > ffffffffc0e7c000 20480 > > > > > ffff96e85cf4e380 ffff96e5506c6c00 ffffffffc0e7c000 - > > > > > ffffffffc0e81000 20480 > > > > > ffff96e802baa500 ffff96e5506c69c0 ffffffffc0e81000 - > > > > > ffffffffc0e86000 20480 > > > > > ffff96e802baac00 ffff96e5506c6cc0 ffffffffc0e86000 - > > > > > ffffffffc0e8c000 24576 > > > > > ffff96e574196f80 ffff96e55ffd6c80 ffffffffc0e90000 - > > > > > ffffffffc0e95000 20480 > > > > > ffff96e574196680 ffff96e55ffd6880 ffffffffc0e95000 - > > > > > ffffffffc0e9a000 20480 > > > > > ffff96e87c222800 ffff96e5496ca680 ffffffffc0e9a000 - > > > > > ffffffffc0ea4000 40960 > > > > > crash> vmap_area ffff96e87c222800 > > > > > struct vmap_area { > > > > > va_start = 18446744072651120640, > > > > > va_end = 18446744072651161600, > > > > > flags = 4, > > > > > rb_node = { > > > > > __rb_parent_color = 18446628510972342169, > > > > > rb_right = 0x0, > > > > > rb_left = 0xffff96e574196698 > > > > > }, > > > > > list = { > > > > > next = 0xffffffffae69af90, > > > > > prev = 0xffff96e5741966b0 > > > > > }, > > > > > purge_list = { > > > > > next = 0x0, > > > > > prev = 0xdead000000000200 > > > > > }, > > > > > vm = 0xffff96e5496ca680, > > > > > callback_head = { > > > > > next = 0x0, > > > > > func = 0xffff96e71d51aa00 > > > > > } > > > > > } > > > > > crash> sym 0xffffffffae69af90 > > > > > ffffffffae69af90 (D) vmap_area_list > > > > > crash> > > > > > > > > > > Dave > > > > > > > > > > > > > > > > > > > > > > > Yeah kmem -v fails as well: > > > > crash> kmem -v > > > > > > > > kmem: invalid list entry: 0 > > > > WARNING: malloc/free mismatch (29/30) > > > > crash> > > > > > > > > > > > > There's no indicating of an error when crash loads though - > > > > only > > > > after > > > > running these commands. Do you think this a damaged vmcore > > > > that is > > > > not > > > > obvious? > > > > > > I don't know it's damaged or if it's a symptom of the kernel > > > crash. Is the > > > kernel crash happening while the vmlist is being modified? > > > > > > > No there are no active processes modifying the vmap_area_list > > > > It is crash due to memory corruption and there are 3rd party > > modules. > > But the crash is inside xfs and does not appear in any way related > > to > > this nor are the 3rd party modules in any backtraces. > > > > > > > It's not obvious because it would only be seen when dump_vmlist() > > > is > > > called. > > > When dump_vmlist() calls dump_vmap_area(), and do_list() returns > > > -1 > > > back to > > > dump_vmap_area(), it gets used as a loop-ending index, and then > > > causes > > > presumably bogus values to get returned: > > > > > > do_vmap_area(struct meminfo *vi) > > > { > > > ... > > > > > > ld->end = symbol_value("vmap_area_list"); > > > cnt = do_list(ld); > > > > > > for (i = 0; i < cnt; i++) { > > > ... > > > > > > if (vi->flags & GET_HIGHEST) > > > vi->retval = start+size; > > > > > > if (vi->flags & GET_VMLIST_COUNT) > > > vi->retval = count; > > > > > > if (vi->flags & VMLIST_VERIFY) > > > vi->retval = verified; > > > } > > > > > > Mabye dump_vmap_area() should do a error(FATAL, ...) if cnt is > > > -1? Although, > > > that would kill all search command attempts. It's hard to say. > > > > > > Dave > > > > > > > > > > I would maybe consider something like this which seems to fix the > > persistent malloc/free mismatch errors and give some results? > > > > crash> kmem -v > > > > kmem: invalid list entry: 0 > > crash> > > crash> kmem -v > > > > kmem: invalid list entry: 0 > > crash> search 0xdeadbeef > > > > search: invalid list entry: 0 > > > > search: invalid list entry: 0 > > > > search: invalid list entry: 0 > > ffff88078d2cda80: deadbeef > > crash> > > crash> kmem -v > > > > kmem: invalid list entry: 0 > > crash> quit > > > > > > $ git diff memory.c > > diff --git a/memory.c b/memory.c > > index 8cdab06..7161d9d 100644 > > --- a/memory.c > > +++ b/memory.c > > @@ -8722,6 +8722,11 @@ dump_vmap_area(struct meminfo *vi) > > ld->list_head_offset = OFFSET(vmap_area_list); > > ld->end = symbol_value("vmap_area_list"); > > cnt = do_list(ld); > > + if (cnt < 0) { > > + vi->retval = 0; > > + FREEBUF(vmap_area_buf); > > + return; > > + } > > > > for (i = 0; i < cnt; i++) { > > if (!(pc->curcmd_flags & HEADER_PRINTED) && (i == > > 0) && > > > > -- > > I was wondering how the search command would handle its call to > machdep->get_kvaddr_ranges() > with the patch above -- which would return 0 as the vmalloc address > range's "end" address. > But given your output above, apparently it seems to work around it. > > Thanks, > Dave > > As far as I could tell, the code properly checks for a non-zero meminfo.retval before proceeding in all instances. -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility