Hi, We're running a 2.6.38 derivative (Ubuntu kernel 2.6.38-8-server) and we're seeing quite a few of the following. Yes I know it's an old kernel but I'm hoping that it might ring a bell with somebody who can point me at a patch (or shed some light). [22946.698115] general protection fault: 0000 [#1] SMP [22946.700264] last sysfs file: /sys/devices/system/cpu/cpu23/cache/index2/shared_cpu_map [22946.703589] CPU 11 [22946.704437] Modules linked in: xt_comment act_police cls_u32 sch_ingress cls_fw sch_htb ebt_arp ebt_ip ipmi_devintf ipmi_si ipmi_msghandler xt_recent xt_multiport ebtable_nat ebtables ipt_MASQUERADE iptable_nat xt_CHECKSUM iptable_mangle bridge kvm_intel kvm nbd 8021q garp stp ib_iser rdma_cm ib_cm iw_cm ipt_REJECT ib_sa ib_mad ipt_LOG vesafb ib_core ib_addr xt_limit iscsi_tcp xt_tcpudp libiscsi_tcp libiscsi ipt_addrtype scsi_transport_iscsi xt_state ip6table_filter ip6_tables nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables i7core_edac serio_raw ghes lp edac_core hed parport xfs exportfs usbhid hid igb hpsa dca [22946.733651] [22946.734244] Pid: 96, comm: ksmd Not tainted 2.6.38-8-server #42-Ubuntu HP SE2170s /SE2170s [22946.738822] RIP: 0010:[<ffffffff8114f74e>] [<ffffffff8114f74e>] remove_rmap_item_from_tree+0x9e/0x140 [22946.864990] RSP: 0018:ffff880bdc3dbdf0 EFLAGS: 00010286 [22946.929703] RAX: ffff880b0ef3b7f0 RBX: ffff880b082d3fc0 RCX: ffffea00534c94e0 [22946.994971] RDX: 0000880b081ef670 RSI: ffff8817de31ff53 RDI: ffffea00534c94d8 [22947.058724] RBP: ffff880bdc3dbe10 R08: 000000000001b555 R09: ffff880bc4a83ffc [22947.121702] R10: ffff880bc4a83000 R11: 0000000000000001 R12: ffff8817de31ff50 [22947.185048] R13: ffffea00534c94d8 R14: ffff880bdc3dbe98 R15: ffff880bdc3d2dc0 [22947.249391] FS: 0000000000000000(0000) GS:ffff88183fca0000(0000) knlGS:0000000000000000 [22947.380838] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [22947.449947] CR2: 00007f8b66bebd00 CR3: 0000000001a03000 CR4: 00000000000026e0 [22947.521049] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [22947.592299] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [22947.661556] Process ksmd (pid: 96, threadinfo ffff880bdc3da000, task ffff880bdc3d2dc0) [22947.796881] Stack: [22947.863663] ffffffff81074b10 ffff880bdc3d2dc0 ffffea0027eba0a0 ffff880b082d3fc0 [22947.999100] ffff880bdc3dbe70 ffffffff8115030c ffff880bdc3dbe70 ffffffff8114fdcc [22948.134245] ffff880b082d3fc0 0000000000000000 0000000000000000 ffff880bdc3d2dc0 [22948.269277] Call Trace: [22948.334258] [<ffffffff81074b10>] ? process_timeout+0x0/0x10 [22948.399750] [<ffffffff8115030c>] cmp_and_merge_page+0x2c/0x3f0 [22948.463915] [<ffffffff8114fdcc>] ? scan_get_next_rmap_item+0x29c/0x450 [22948.527601] [<ffffffff8115077f>] ksm_scan_thread+0xaf/0x2a0 [22948.590328] [<ffffffff81087940>] ? autoremove_wake_function+0x0/0x40 [22948.652426] [<ffffffff811506d0>] ? ksm_scan_thread+0x0/0x2a0 [22948.713048] [<ffffffff810871f6>] kthread+0x96/0xa0 [22948.771932] [<ffffffff8100cde4>] kernel_thread_helper+0x4/0x10 [22948.830631] [<ffffffff81087160>] ? kthread+0x0/0xa0 [22948.887967] [<ffffffff8100cde0>] ? kernel_thread_helper+0x0/0x10 [22948.945959] Code: 28 4c 89 e7 e8 f4 f8 ff ff 48 85 c0 49 89 c5 74 d2 f0 0f ba 28 00 19 c0 85 c0 0f 85 99 00 00 00 48 8b 43 30 48 8b 53 38 48 85 c0 <48> 89 02 74 04 48 89 50 08 48 b8 00 01 10 00 00 00 ad de 48 ba [22949.127753] RIP [<ffffffff8114f74e>] remove_rmap_item_from_tree+0x9e/0x140 [22949.188851] RSP <ffff880bdc3dbdf0> What I've learned so far from the crash dump is the following. We crash in ksm.c:remove_rmap_item_from_tree() trying to remove an rmap_item from a linked list: 536 static void remove_rmap_item_from_tree(struct rmap_item *rmap_item) 537 { 538 if (rmap_item->address & STABLE_FLAG) { 539 struct stable_node *stable_node; 540 struct page *page; 541 542 stable_node = rmap_item->head; 543 page = get_ksm_page(stable_node); 544 if (!page) 545 goto out; 546 547 lock_page(page); 548 hlist_del(&rmap_item->hlist); <-- this is where we crash When looking at that particular rmap_item, the list node pprev address seems corrupted (its upper 2 bytes are zeroed out): crash> rmap_item ffff880b082d3fc0 struct rmap_item { rmap_list = 0xffff880b081ef000, anon_vma = 0xffff8817ddaac2a8, mm = 0xffff8817de07ad80, address = 140436244759040, oldchecksum = 833604096, { node = { rb_parent_color = 18446612234826284880, rb_right = 0xffff880b0ef3b7f0, rb_left = 0x880b081ef670 }, { head = 0xffff8817de31ff50, hlist = { next = 0xffff880b0ef3b7f0, pprev = 0x880b081ef670 <-- corrupt address? } } } } And the list node by itself: crash> hlist_node 0xffff880b082d3ff0 struct hlist_node { next = 0xffff880b0ef3b7f0, pprev = 0x880b081ef670 <-- corrupt address? } The list seems to be intact except for this one address. A fragment of the list with the corrupted node in the middle: crash> list -s hlist_node ffff880b08311630 ffff880b08311630 struct hlist_node { next = 0xffff880b083115f0, pprev = 0xffff880b08311a30 } ffff880b083115f0 struct hlist_node { next = 0xffff880b081ef9b0, pprev = 0xffff880b08311630 } ffff880b081ef9b0 struct hlist_node { next = 0xffff880b081ef670, pprev = 0xffff880b083115f0 } ffff880b081ef670 struct hlist_node { next = 0xffff880b082d3ff0, pprev = 0xffff880b081ef9b0 } ffff880b082d3ff0 struct hlist_node { next = 0xffff880b0ef3b7f0, pprev = 0x880b081ef670 <-- memory corruption? upper 2 bytes are zeroed out but should be ff ff. } ffff880b0ef3b7f0 struct hlist_node { next = 0xffff8817b6ed8eb0, pprev = 0xffff880b082d3ff0 } ffff8817b6ed8eb0 struct hlist_node { next = 0xffff8817b6ed8e70, pprev = 0xffff880b0ef3b7f0 } ffff8817b6ed8e70 struct hlist_node { next = 0xffff880b1202e2b0, pprev = 0xffff8817b6ed8eb0 } ffff880b1202e2b0 struct hlist_node { next = 0xffff8817187c6b30, pprev = 0xffff8817b6ed8e70 } If I'm not mistaken, the upper 2 bytes of the pprev address are at the end of the rmap_item struct. Could this be a memory corruption/overwrite? Thanks ...Juerg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>