----- Original Message ----- > I would give some more info. > > It is dual core system. (ARM) > both core are stuck at wfi (wait for interrupt) > and we observe that the timer counter has one much ahead than the comparators. > so we never get a local timer interrupt, and nobody is there to wake the cpu up. > > so we observe the freeze. > > Regards, > Oza. I don't know much about the ARM architecture, and the only sample SMP ARM dumpfile I have on hand shows the non-panicking cpu blocked in default_idle(). So I don't understand how "wfi" would come into play. What does "bt -a" show? > > some more info: > I am debugging crash utility with gdb, and getting following stack trace. > > crash> timer > TVEC_BASES[0]: c0a419c0 > JIFFIES > 4297762 > EXPIRES TIMER_LIST FUNCTION > 128 c1621ea8 c007260c <idle_worker_timeout> > 30208 c0b81f04 c04e4244 <inet_frag_secret_rebuild> > 30720 c0b7f264 c0461440 <flow_cache_new_hashrnd> > 30840 dba2be04 c0068ebc <process_timeout> > 38228 dbae5e04 c0068ebc <process_timeout> > 11796480 c097cb64 c0010aa4 <sched_clock_poll> > 4294937694 c0a6f118 c026f820 <rx_timeout_handler> > 4294945658 c16238fc c007412c <delayed_work_timer_fn> > 4294945667 d811be14 c0068ebc <process_timeout> > 4294945700 c16237cc c007412c <delayed_work_timer_fn> > 4294945700 c16236e0 c007412c <delayed_work_timer_fn> > 4294946020 c0a1dcbc c007412c <delayed_work_timer_fn> > 4294946029 dca8f884 c007412c <delayed_work_timer_fn> > 4294946504 c0b871c4 c007412c <delayed_work_timer_fn> > 4294950720 c0b81d6c c007412c <delayed_work_timer_fn> > > Breakpoint 2, do_list (ld=0xff961c78) at tools.c:3507 > 3507 error(INFO, "\ninvalid list entry: %lx\n", next); > (gdb) bt > #0 do_list (ld=0xff961c78) at tools.c:3507 > #1 0x0811de03 in do_timer_list (vec_kvaddr=3699761524, size=256, > vec=0x85c9f40, option=0x0, highest=0x0, tv=0xff962ec4) at > kernel.c:6983 > #2 0x0811c9d3 in dump_timer_data_tvec_bases_v2 () at kernel.c:6678 > #3 0x0811afac in dump_timer_data () at kernel.c:6370 > #4 0x0811af8a in cmd_timer () at kernel.c:6329 > #5 0x080910a1 in exec_command () at main.c:818 > #6 0x08090ec7 in main_loop () at main.c:766 > #7 0x081bf35a in current_interp_command_loop () > #8 0x081bfbcf in captured_command_loop () > #9 0x081beddc in catch_errors () > #10 0x081c0a9a in captured_main () > #11 0x081beddc in catch_errors () > #12 0x081c0adc in gdb_main () > #13 0x081c0b29 in gdb_main_entry () > #14 0x08121590 in gdb_main_loop (argc=2, argv=0xff964014) at gdb_interface.c:76 > #15 0x08090c01 in main (argc=3, argv=0xff964014) at main.c:671 > > here exactly I hit invalid entry. Right, I understand where the error message came from. The crash utility's do_list() function is simply reporting what it sees in the list_head-type linked list that it was following. I have only seen these types of timer command errors in vmcores that were generated with the "snap.so" extension module, or when running the command on a live system. And both of those scenarios make perfect sense because the underlying kernel was running/modifying the timer-related data structures while the memory was being copied. Presuming that the crash was taken with kdump, you would typically expect that the timer data structures would be stable. Dave -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility