Hi, We are facing a slab corruption issue while using e1000 net driver in certain situations. Its very hard to reproduce the problem as well as though we know that what is corrupted, we can’t figure out why its corrupted. If you could throw some light, it would be great. Kindly note that this is legacy system which is actively used in enterprise backup applications and have several units on field. It has 2.6.23 kernel ( I agree its ancient but still we need to support ☹ ) The panic occurs like this: 11054 (E0)[ 5865198.540571] ------------[ cut here ]------------ 11055 (E2)[ 5865198.605172] kernel BUG at mm/slab.c:2983! 11056 (U0)(MSG-KERN-00005):[ 5865198.662501] invalid opcode: 0000 [1] SMP 11057 (E4)[ 5865198.737975] CPU 15 11058 (E4)[ 5865198.772608] Modules linked in: msr ipmi_watchdog 8021q mptctl dd_iosched(P) uhci_hcd ehci_hcd i2c_i801 i2c_core ipmi_si ipmi_devintf ipmi_msghandler usbcore dd 11059 (E4)[ 5865199.132810] Pid: 5504, comm: ddfs Tainted: P M 2.6.23-ddr272789 #1 11060 (E4)[ 5865199.221161] RIP: 0010:[<ffffffff8028996b>] [<ffffffff8028996b>] cache_alloc_refill+0x1bb/0x230 11061 (E4)[ 5865199.334792] RSP: 0000:ffffc20006215cb0 EFLAGS: 00010046 11062 (E4)[ 5865199.407682] RAX: 000000000000000f RBX: 0000000000000027 RCX: 0000000000000028 11063 (E4)[ 5865199.502352] RDX: ffff8110182abd40 RSI: ffff81020b75e000 RDI: ffff8110182afcc0 11064 (E4)[ 5865199.597019] RBP: ffff8103368d3000 R08: 0000000000000020 R09: ffff81018bc1b820 11065 (E4)[ 5865199.691685] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000014 11066 (E4)[ 5865199.786357] R13: ffff81101830d000 R14: ffff8110182abd40 R15: ffff8110182afcc0 11067 (E4)[ 5865199.881150] FS: 0000000041ccc960(0063) GS:ffff811018269ec0(0000) knlGS:0000000000000000 11068 (E4)[ 5865199.987328] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b 11069 (E4)[ 5865200.065439] CR2: 00002aaaaaaec000 CR3: 00000007919d1000 CR4: 00000000000006e0 11070 (E4)[ 5865200.160170] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 11071 (E4)[ 5865200.254939] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 11072 (E4)[ 5865200.349703] Process ddfs (pid: 5504, threadinfo ffff810217072000, task ffff8102e43d6000) 11073 (E4)[ 5865200.455880] Stack: 0000000000000000 ffff8110182abd50 0000002000000000 ffff81100127e000 11074 (E4)[ 5865200.562070] 0000000000000000 0000000000000286 0000000000000020 ffff8110182afcc0 11075 (E4)[ 5865200.660782] 0000000000000000 ffffffff8028956d ffff81100127e000 00000000ffffffff 11076 (E4)[ 5865200.757206] Call Trace: 11077 (E4)[ 5865200.798266] <IRQ> 11078 (E4)[ 5865200.831762] [<ffffffff8028956d>] kmem_cache_alloc_node+0xfd/0x110 11079 (E4)[ 5865200.915023] [<ffffffff8044817c>] __alloc_skb+0x4c/0x160 11080 (E4)[ 5865200.987911] [<ffffffff804482bb>] __netdev_alloc_skb+0x2b/0x50 11081 (E4)[ 5865201.067036] [<ffffffff880f8746>] :e1000e:e1000_alloc_rx_buffers+0x1d6/0x250 11082 (E4)[ 5865201.160729] [<ffffffff880f904f>] :e1000e:e1000_clean_rx_irq+0x2ff/0x320 11083 (E4)[ 5865201.250308] [<ffffffff880fb089>] :e1000e:e1000_poll+0x79/0x2e0 11084 (E4)[ 5865201.330490] [<ffffffff8810c6b5>] :e1000e:__kc_adapter_clean+0x35/0x60 11085 (E4)[ 5865201.417960] [<ffffffff80450516>] net_rx_action+0x96/0x180 11086 (E4)[ 5865201.492967] [<ffffffff8023804b>] __do_softirq+0x8b/0x120 11087 (E4)[ 5865201.566893] [<ffffffff8020d0ac>] call_softirq+0x1c/0x30 11088 (E4)[ 5865201.639780] [<ffffffff8020e6ea>] do_softirq+0x4a/0xb0 11089 (E4)[ 5865201.710594] [<ffffffff8020e5c2>] do_IRQ+0xc2/0x1a0 11090 (E4)[ 5865201.778297] [<ffffffff8020c3c1>] ret_from_intr+0x0/0xa The stack trace is always like this. There is one more thread which is working on the same skbuff as the above task. 13369 [exception RIP: _spin_lock+7] 13370 RIP: ffffffff804de8c7 RSP: ffff8105cffa5af0 RFLAGS: 00000082 13371 RAX: ffff81101830ac00 RBX: ffff8101f82b86c0 RCX: 0000000000000000 13372 RDX: ffff81101830ac00 RSI: 0000000000000020 RDI: ffff8110182abd80 13373 RBP: 0000000000000000 R8: 0000000000000080 R9: 0000000000000080 13374 R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000003c 13375 R13: ffff81101830ac00 R14: ffff8110182abd40 R15: ffff8110182afcc0 13376 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 13377 --- <NMI exception stack> --- 13378 [ffffffff80725000] _spin_lock at ffffffff804de8c7 13379 [ffff8105cffa5af0] cache_alloc_refill at ffffffff80289839 13380 [ffff8105cffa5b40] kmem_cache_alloc_node at ffffffff8028956d 13381 [ffff8105cffa5b70] __alloc_skb at ffffffff8044817c 13382 [ffff8105cffa5bb0] tcp_send_ack at ffffffff804977d6 13383 [ffff8105cffa5bc0] tcp_rcv_established at ffffffff80492d2b 13384 [ffff8105cffa5c00] tcp_v4_do_rcv at ffffffff8049ab05 13385 [ffff8105cffa5c08] autoremove_wake_function at ffffffff80248110 13386 [ffff8105cffa5c60] tcp_prequeue_process at ffffffff80489e5b 13387 [ffff8105cffa5c80] tcp_recvmsg at ffffffff8048a7f6 13388 [ffff8105cffa5d10] sock_common_recvmsg at ffffffff804475a0 13389 [ffff8105cffa5d30] sock_aio_read at ffffffff80442f3f 13390 [ffff8105cffa5dd0] do_sync_read at ffffffff8029076b 13391 [ffff8105cffa5e40] autoremove_wake_function at ffffffff80248110 Tried to debug the core file as below: ffff8110182afb40 skbuff_fclone_cache 448 5 1072 134 4k kmem: skbuff_head_cache: partial list: slab: ffff8103368d3000 bad inuse counter: 15 ffff8110182afcc0 skbuff_head_cache 256 14 15 1 4k crash> slab ffff8103368d3000 struct slab { list = { next = 0xffff810322925040, prev = 0xffff8110182abd40 This is actually kmem_list3 - l3 cache object of skbuff_head_cache. Need to check why this is assigned as previous slab pointer. }, colouroff = 128, s_mem = 0xffff8103368d3080, inuse = 15, ------------> This should be 14 free = 4294967295, ----> This should be 1. As its unsigned value, it should never go below 0. Looks like double free has happened. nodeid = 0 } crash> kmem_list3 ffff8110182abd40 struct kmem_list3 { slabs_partial = { next = 0xffff8103368d3000,---> referred as the slab with bad inuse prev = 0xffff810291f4f000 }, slabs_full = { next = 0xffff81020b75e000, prev = 0xffff810342085000 }, slabs_free = { next = 0xffff8110182abd60, prev = 0xffff8110182abd60 }, free_objects = 3165, free_limit = 1035, colour_next = 1, list_lock = { The slab->inuse counter did not get decreased when a slab object is freed, thus corrupting the slab list. All other nodes in prev and next are fine and only this insertion is wrong. So definitely the prev pointer assignment for ffff8103368d3000 node has gone wrong. Tried to search the references read from the corrupt address. crash> rd -s ffff810000725dc0 20 ffff810000725dc0: ffff8110182abd40 ffff81101830ac00 ffff810000725dd0: 000000000000003c 0000000000000000 crash> rd -s ffff810000725f60 50 ffff810000725f60: ffff8110182abd40 ffff81101830ac00 ffff810000725f70: 000000000000003c 0000000000000000 Found that ffff8110182abd40 is the kmem_list3 L3 cache object of node 0 present in skbuff_head_cache crash> kmem_cache 0xffff8110182afcc0 struct kmem_cache { array = {0xffff81101830ac00, 0xffff81101830a800, 0xffff81101830a400, 0xffff81101830a000, 0xffff81101830bc00, 0xffff81101830b800, 0xffff81101830b400, 0xffff81101830b000, 0xffff 81101830cc00, 0xffff81101830c800, 0xffff81101830c400, 0xffff81101830c000, 0xffff81101830dc00, 0xffff81101830d800, 0xffff81101830d400, 0xffff81101830d000, 0x0, 0x0, 0x0, 0x0, 0x0 , 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, … .. nodelists = {0xffff8110182abd40, 0x0, 0x0, 0xffff8110182b0c00, 0xffff8110182b0800, ffff81101830ac00 ---> is present in following cpu register map 13362 START: crash_nmi_callback at ffffffff8021c236 13363 [ffffffff80725eb0] crash_nmi_callback at ffffffff8021c236 13364 [ffffffff80725ec0] notifier_call_chain at ffffffff804e195f 13365 [ffffffff80725ef0] notify_die at ffffffff8024babd 13366 [ffffffff80725f20] default_do_nmi at ffffffff804dfd61 13367 [ffffffff80725f40] do_nmi at ffffffff804e0357 13368 [ffffffff80725f50] nmi at ffffffff804df5ef 13369 [exception RIP: _spin_lock+7] 13370 RIP: ffffffff804de8c7 RSP: ffff8105cffa5af0 RFLAGS: 00000082 13371 RAX: ffff81101830ac00 RBX: ffff8101f82b86c0 RCX: 0000000000000000 13372 RDX: ffff81101830ac00 RSI: 0000000000000020 RDI: ffff8110182abd80 13373 RBP: 0000000000000000 R8: 0000000000000080 R9: 0000000000000080 13374 R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000003c 13375 R13: ffff81101830ac00 R14: ffff8110182abd40 R15: ffff8110182afcc0 13376 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 13377 --- <NMI exception stack> --- 13378 [ffffffff80725000] _spin_lock at ffffffff804de8c7 13379 [ffff8105cffa5af0] cache_alloc_refill at ffffffff80289839 13380 [ffff8105cffa5b40] kmem_cache_alloc_node at ffffffff8028956d 13381 [ffff8105cffa5b70] __alloc_skb at ffffffff8044817c 13382 [ffff8105cffa5bb0] tcp_send_ack at ffffffff804977d6 13383 [ffff8105cffa5bc0] tcp_rcv_established at ffffffff80492d2b 13384 [ffff8105cffa5c00] tcp_v4_do_rcv at ffffffff8049ab05 13385 [ffff8105cffa5c08] autoremove_wake_function at ffffffff80248110 13386 [ffff8105cffa5c60] tcp_prequeue_process at ffffffff80489e5b 13387 [ffff8105cffa5c80] tcp_recvmsg at ffffffff8048a7f6 ffff8110182afcc0 --> skbuff_head_cache whereas ffff81101830ac00 is the array_cache for node 0. crash> array_cache 0xffff81101830ac00----------> struct array_cache { avail = 0, ----> array cache is empty limit = 120, batchcount = 60, touched = 1, lock = { raw_lock = { slock = 1 }, tracker = { caller = 0, owner = 0 } }, entry = 0xffff81101830ac20 } crash> kmem -S skbuff_head_cache CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE kmem: skbuff_head_cache: partial list: slab: ffff8103368d3000 bad inuse counter: 15 ffff8110182afcc0 skbuff_head_cache 256 14 15 1 4k kmem: skbuff_head_cache: partial list: slab: ffff8103368d3000 bad inuse counter: 15 SLAB MEMORY TOTAL ALLOCATED FREE ffff81020b75e000 ffff81020b75e080 15 14 1 FREE / [ALLOCATED] [ffff81020b75e080] [ffff81020b75e180] [ffff81020b75e280] [ffff81020b75e380] [ffff81020b75e480] [ffff81020b75e580] ffff81020b75e680 (cpu 15 cache) ----> this is freed cache [ffff81020b75e780] [ffff81020b75e880] [ffff81020b75e980] [ffff81020b75ea80] [ffff81020b75eb80] [ffff81020b75ec80] [ffff81020b75ed80] [ffff81020b75ee80] I see this pattern in all the references of the corrupt value. Not sure what is the link of both. This looks like even when the slabs_partial was full, it was not moved to slabs_full which caused the other thread assume that there is a slab available in this cache and it panics at /* * The slab was either on partial or free list so * there must be at least one object available for * allocation. */ BUG_ON(slabp->inuse >= cachep->num); There are lots of activity happening on e1000e driver. 1263: 3567059092 PCI-MSI-edge eth5-----------> e1000e interrupt count This is on CPU 15 where the slab corruption happened. On other CPUs as well there is more or less equal load. Irqbalance is distributing the load. We have enabled slab_debug and running it for few days but as I mentioned its hard to repro the issue. Is there any way to figure out when and how the insertion of slab would have corrupted? Can you throw some light on this issue? Thanks ��.n������g����a����&ޖ)���)��h���&������梷�����Ǟ�m������)�����b�n���y��{^�w�r���&�i��('����춊m�鞵��â����چ�����i�������$����