RE: [PATCH] Fix double free of MPT request frames.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On May 26, 2009 at 12:25 pm, Alok Kataria wrote:

While testing scsi path failover for disks using MPT drivers we hit a
kernel oops on RHEL 5.1-64bit, while analyzing the problem we noticed
that this is due to a race present in the mpt scsi code path. The same
race seems to be present in the latest git kernel code too.


I'm sorry to be bringing up something from months ago, but I do have
some questions about this.  I believe I might be seeing the same
problem.  We're running Ubuntu 8.04.3 LTS (linux kernel
2.6.24-24-generic), and the code is quite similar to your patch, to
the point that the patch applies smoothly except for a couple of
hunks in comments.  So I have a couple of questions:

1) What was the kernel oops like?  In particular, was it like the one
   I saw?  (I'll add details of the kernel oops later in this email).

2) Has this patch or anything like it been accepted into the kernel?
   I have my doubts, as Kashyap Desai had this to say about it on May
   29, 2009 at 10:55pm (text re-formatted to break up long lines):

Consider below two points as potential bug in your patch.

#1. As per MPT Fusion design, driver should not free message frame
 which is still with FW. We should make sure that FW will not use
 submitted message frame in future before freeing it. There are
 multiple way to make sure FW will not use msg frame.
 a. Successful return in ISR for msg frame.
 b. HardReset
 c. Task Abort (success case)

Potential bug in your patch is you are freeing msg frame before HardReset.
There is still possibility that FW will use that freed message
frame. It can fault IOC or may be IOC hangs.

#2. Making TmState to NONE much before completing cleanup work for
 previously faild TM.  In general, TmState = TM_STATE_NONE should be
 just before you return from mptscsih_tm_wait_for_completion()
 failure case.


3) Is this fixed in the latest kernel (currently 2.6.32)?  The code
   has changed quite a bit, so it isn't obvious that the problem still
   exists, nor is it obvious that it has been fixed.

Unfortunately, it isn't easy for me to upgrade to the latest kernel,
as we want to settle on a long-term supportable Unbuntu release for
our deployments, so we use 8.04.3 LTS, for which the newest kernel so
far is 2.6.24-26-generic.

Here are the details of my kernel oops, as I promised in #1 above:

Dec  8 13:29:29 syntropy-pcc kernel: [ 4267.933351] mptscsih: ioc0: Issue of TaskMgmt failed!
Dec  8 13:29:29 syntropy-pcc kernel: [ 4267.933361] mptscsih: ioc0: task abort:FAILED (sc=f6b44a00)
Dec  8 13:29:29 syntropy-pcc kernel: [ 4267.933373] mptscsih: ioc0: attempting target reset! (sc=f6b44a00)
Dec  8 13:29:29 syntropy-pcc kernel: [ 4267.933377] sd 2:0:1:0: [sdb] CDB: Read(10): 28 00 01 20 aa c4 00 00 04 00
Dec  8 13:29:29 syntropy-pcc kernel: [ 4267.933431]  target2:0:0: Beginning Domain Validation
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.044445] mptscsih: ioc0: target reset: SUCCESS (sc=f6b44a00)
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.044804]  target2:0:0: Domain Validation Initial Inquiry Failed
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.044874]  target2:0:0: Ending DomainValidation
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.045001] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000005
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.045128] printing eip: f89235fa *pde= 00000000
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.045299] Oops: 0002 [#1] SMP
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.045413] Modules linked in: iscsi_trgt crc32c libcrc32c vmmemctl cpufreq_powersave cpufreq_userspace cpufreq_ondemand cpufreq_conservative cpufreq_stats freq_table video output battery sbs sbshc dock iptable_filter ip_tables x_tables iscsi_tcp libiscsi scsi_transport_iscsi lploop ipv6 container evdev serio_raw ac button intel_agp parport_pc parport i2c_piix4 agpgart shpchp pci_hotplug i2c_core psmouse pcspkr ext3 jbd mbcache sd_modsg sr_mod cdrom ata_generic floppy pcnet32 mii mptspi mptscsih mptbase scsi_transport_spi ata_piix pata_acpi libata scsi_mod raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod dm_mirror dm_snapshot dm_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse vmxnet
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046097]
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046177] Pid: 6, comm: events/0 Not tainted (2.6.24-24-generic #1)
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046237] EIP: 0060:[<f89235fa>] EFLAGS: 00010097 CPU: 0
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046480] EIP is at mpt_get_msg_frame+0x6a/0x100 [mptbase]
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046541] EAX: 0000400a EBX: f7fc5000ECX: df8c49c0 EDX: 00000001
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046599] ESI: df8c49c0 EDI: 0000000fEBP: f7fc5104 ESP: f7c2bde8
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046656]  DS: 007b ES: 007b FS: 00d8GS: 0000 SS: 0068
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046723] Process events/0 (pid: 6, ti=f7c2a000 task=f7c28000 task.ti=f7c2a000)
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046788] Stack: 00000000 c01731e0 00000044 f7c2be6c 0000000a 0f000082 00000202 f7fc5000
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.046922]        f7ea6c14 f7c2bec4 f7fc5000 f8926c5d 00000001 c03ef590 000000d0 0000000c
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047015]        00000000 0000000c f7c8444c 00000000 c0109173 00000000 f7c2bf24 f7c8444c
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047111] Call Trace:
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047269]  [<c01731e0>] __alloc_pages+0x60/0x3a0
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047453]  [<f8926c5d>] mpt_config+0x2d/0x2f0 [mptbase]
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047510]  [<c0109173>] dma_alloc_coherent+0xc3/0x110
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047567]  [<f88df86f>] mptspi_read_parameters+0x12f/0x3f0 [mptspi]
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047636]  [<f88e016f>] mptspi_dv_device+0x6f/0x170 [mptspi]
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047688]  [<c028063e>] get_device+0xe/0x20
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047744]  [<f893a13e>] scsi_device_get+0x1e/0x50 [scsi_mod]
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047814]  [<f893a298>] __scsi_iterate_devices+0x48/0x70 [scsi_mod]
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047879]  [<f88e0319>] mptspi_dv_renegotiate_work+0xa9/0xd0 [mptspi]
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047935]  [<c0179750>] vmstat_update+0x0/0x30
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.047988]  [<f88e0270>] mptspi_dv_renegotiate_work+0x0/0xd0 [mptspi]
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048043]  [<c013cecf>] run_workqueue+0xbf/0x160
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048094]  [<c013d970>] worker_thread+0x0/0xe0
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048141]  [<c013d9f4>] worker_thread+0x84/0xe0
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048188]  [<c0140c80>] autoremove_wake_function+0x0/0x40
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048240]  [<c013d970>] worker_thread+0x0/0xe0
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048287]  [<c01409c2>] kthread+0x42/0x70
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048331]  [<c0140980>] kthread+0x0/0x70
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048375]  [<c0105667>] kernel_thread_helper+0x7/0x10
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048479]  =======================
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048536] Code: 00 8d ab 04 01 00 00 89 e8 e8 d3 9e 9f c7 8b 93 08 01 00 00 89 44 24 18 8d 83 08 01 00 00 39 c2 74 5489 d6 8b 12 8b 46 04 89 f1 <89> 42 04 89 10 89 f8 c7 46 08 00 00 00 00 c7 06 0001 10 00 88
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.048824] EIP: [<f89235fa>] mpt_get_msg_frame+0x6a/0x100 [mptbase] SS:ESP 0068:f7c2bde8
Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.049231] ---[ end trace 4cf91050ea603706 ]---

I used objdump and matched up its output to the oops, and I concluded the oops occured at 5fa below:

/home/darius/ubuntu_src/linux-2.6.24/include/linux/list.h:157
 * This is only for internal list manipulation where we know
 * the prev/next entries already!
 */
static inline void __list_del(struct list_head * prev, struct list_head * next)
{
        next->prev = prev;
     5fa:       89 42 04                mov    %eax,0x4(%edx)
/home/darius/ubuntu_src/linux-2.6.24/include/linux/list.h:158
        prev->next = next;
     5fd:       89 10                   mov    %edx,(%eax)

The oops reports that EDX is 0x1, which means 5fa is trying to write
to the address 0x5, which fits with this line in the oops:

Dec  8 13:29:29 syntropy-pcc kernel: [ 4268.045001] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000005

In short, a next pointer in the message frame has a ridiculous value,
which could be caused by using a message frame that has already been
freed.

--
Darius S. Naqvi
dnaqvi@xxxxxxxxxxxxxxx
http://www.datagardens.com
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux