Re: getting panic during kmalloc

"Shreyansh Jain" <shrey.linux@xxxxxxxxx> · Fri, 11 Jul 2008 11:00:13 +0530

Hi Gagan and List,

Please see my comments inline.

On Thu, Jul 10, 2008 at 6:03 PM, gagan grover <grovershah@xxxxxxxxx> wrote:
> I have tried with GFP_HIGHUSER and I am able to allocate 1M buffers of 24
> bytes.

[snip]

>>> Hi
>>> I have a requirement of creating 1M buffers of 24 bytes.
>>> So, my driver is calling kmalloc in loop but it is giving following panic
>>> after some iterations.
>>> System have 4 GB RAM and I was continuosly checking top, it had
>>> sufficient memory to allocate.
>>>
>>> ----------- [cut here ] --------- [please bite here ] ---------
>>> Kernel BUG at slab:1773

While looking at this code in 2.6.9 (stock kernel, yours' seem to be a
RedHat release),
in cache_alloc_refill function, I am guessing BUG seems to be at:

 1970         l3 = list3_data(cachep);
 1971
 1972         BUG_ON(ac->avail > 0);
 1973         spin_lock(&cachep->spinlock);

[line numbers are different for stock and enterprise]. This call has
originated from:

__cache_alloc called from kmem_cache_alloc

   2115         if (likely(ac->avail)) {
   2116                 STATS_INC_ALLOCHIT(cachep);
   2117                 ac->touched = 1;
   2118                 objp = ac_entry(ac)[--ac->avail];
   2119         } else {
   2120                 STATS_INC_ALLOCMISS(cachep);
   2121                 objp = cache_alloc_refill(cachep, flags); <--
   2122         }

What I could infer was that at the first if condition (if(ac-avail) it
fails and goes on to call cache_alloc_refill, which in turn checks
again for the ac->avail value.

When I looked at this link:
http://kerneltrap.org/mailarchive/linux-kernel/2008/3/20/1211424

which has a case where in the ac->avail was being changed by another
CPU. Is your machine more than one CPU? (you seem to be running SMP
kernel).

I agree that I may be completely off the mark (as this link refers to
2.6.24 kernel) - but probably there is no harm in checking.

>>> invalid operand: 0000 [1] SMP
>>> CPU 3
I think you *are* working on a multi-processor machine.

either way, my notion is half cooked - and only you make cook it up
further if you feel it is correct.

>>> Modules linked in: dbg(U) md5 ipv6 parport_pc lp parport autofs4 i2c_dev
>>> i2c_core nfs lockd nfs_acl sunrpc rdma_ucm(U) rdma_cm(U) ib_addr(U) ds
>>> yenta_socked
>>> Pid: 16998, comm: dbg_fmr_create Not tainted 2.6.9-42.ELsmp
>>> RIP: 0010:[<ffffffff80161949>] <ffffffff80161949>{cache_alloc_refill+409}
>>> RSP: 0018:0000010134709e08  EFLAGS: 00010002
>>> RAX: 0000000000000000 RBX: 00000100bff6f728 RCX: 00000100bff6f6e8
>>> RDX: 00000100bff50000 RSI: 0000000000000018 RDI: 00000100bff6f728
>>> RBP: 00000100bfe56000 R08: 0000000000000007 R09: 000001013162b000
>>> R10: 0000000000000000 R11: 0000000000000000 R12: 00000100bff6f6c8
>>> R13: 00000100bff6f680 R14: 0000000000000018 R15: 0000000000000003
>>> FS:  0000002a95579b00(0000) GS:ffffffff804e5200(0000)
>>> knlGS:0000000000000000
>>> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>> CR2: 00000036ea2befc0 CR3: 0000000005da4000 CR4: 00000000000006e0
>>> Process dbg_fmr_create (pid: 16998, threadinfo 0000010134708000, task
>>> 0000010135cb4030)
>>> Stack: 0000000000000018 0000000000000018 00000100bff6f680
>>> 0000010130000000
>>>        000001013162b000 0000007fbfffed20 0000000000000003
>>> ffffffff8016174f
>>>        0000000000000202 0000000000003067
>>> Call Trace:<ffffffff8016174f>{kmem_cache_alloc+90}
>>> <ffffffffa02571bc>{:dbg:dbg_fmr_create+114}
>>>        <ffffffffa025252c>{:dbg:dbg_handle_ioctls+8712}
>>> <ffffffff8018ae05>{sys_ioctl+853}
>>>        <ffffffff8011026a>{system_call+126}
>>>
>>> Code: 0f 0b cc 5d 32 80 ff ff ff ff ed 06 31 d2 41 f7 c6 00 20 00
>>> RIP <ffffffff80161949>{cache_alloc_refill+409} RSP <0000010134709e08>
>>>  <0>Kernel panic - not syncing: Oops

--
Shreyansh

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ