Re: [libsas] Kernel Crash in smp_execute_task

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 15-03-04 06:29 PM, Praveen Murali wrote:
On second thoughts, should we even let smp commands/requests thru for sas end devices (dev->dev_type == SAS_END_DEV) ? if so, wont the following patch more sense? (also, in my last mail the kernel logs were all messed up; sorry dint realize that when I sent the mail. Trying to fix it here)

LSI SAS-2 and SAS-3 HBAs accept several SMP requests.
For example with a LSI 9300-4i4e and lk-3.19.0:

# smp_rep_manufacturer /dev/bsg/sas_host6
Report manufacturer response:
  SAS-1.1 format: 0
  vendor identification: LSI
  product identification: Virtual SMP Port
  product revision level:

That usage is slightly peculiar since that is being issued
on the machine that has sas_host6 (on its PCIe bus). That
cannot be issued from another machine in the same SAS
domain (e.g. via connections to a common expander). Seen
from an expander (or another HBA) a HBA is a SAS_END_DEV.

It is not clear to me whether this usage (i.e. sending
SMP requests to your own machine's HBA) would be broken
by what you are proposing.

Also I cannot remember ever reading anything in the SAS
drafts that precludes sending SMP requests to a SAS target
(e.g. an enclosure).


As I side note, FreeBSD doesn't have special device nodes
for expanders. It takes advantage of the common practice
in SAS-2 and SAS-3 expanders of including a SES device.
Hence in FreeBSD you might use:
  # smp_discover /dev/ses3
to list the disposition of the phys in the expander which
includes /dev/ses3

Doug Gilbert


diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 58e6183..a44019a 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -72,6 +72,14 @@ static int smp_execute_task(struct domain_device *dev, void *req, int req_size,
         struct sas_internal *i =
                 to_sas_internal(dev->port->ha->core.shost->transportt);

+       /* can we send smp commands to a device? */
+       if (dev->dev_type == SAS_END_DEV) {
+                       printk("%s: can we send a smp request to a device?\n",
+                                  __func__);
+                       res = -ECOMM;
+                       goto out;
+       }
+
         mutex_lock(&dev->ex_dev.cmd_mutex);
         for (retry = 0; retry < 3; retry++) {
                 if (test_bit(SAS_DEV_GONE, &dev->state)) {

-----Original Message-----
From: linux-scsi-owner@xxxxxxxxxxxxxxx [mailto:linux-scsi-owner@xxxxxxxxxxxxxxx] On Behalf Of Praveen Murali
Sent: Wednesday, March 04, 2015 1:43 PM
To: linux-scsi@xxxxxxxxxxxxxxx; dan.j.williams@xxxxxxxxx
Cc: JBottomley@xxxxxxxxxxxxx
Subject: [libsas] Kernel Crash in smp_execute_task

Hi Dan,
   I am experiencing a crash in smp_execute_task when it calls mutex_lock with 15K SAS drives HP Model EH0146FAWJB (Seagate model ST9146852SS) connected to a Marvell 88SE9485 SAS/SATA 6Gb/s controller; kernel version is 3.4.87. The crash happens as soon as I connect this drive; crash dump is included in the mail.

This mutex lock was introduced in the commit 89d3cf6ac3cdc4f15a82709f8c78ed169a98be5b. What I see is that the mutex is initialized only for devices with type EDGE_DEV or FANOUT_DEV but in my case smp_execute_task gets called and the device type is SAS_END_DEV. To troubleshoot further I added a check in smp_execute_task before calling mutex_lock and mutex_unlock to do these calls only for EDGE_DEV or FANOUT_DEV device types and the disk got detected; no crash this time. I have captured the logs from this run and included in the mail. I guess there are a couple of ways to fix this (1) add the check before using the mutex (2) move the mutex to domain device (from expander device) and initialize it for all device types. Do you think either of these is a valid approach? Or am I missing something here.

Thanks for your time!
Praveen

--------------------------Begin Change in sas_expander.c-----------------------------------
  diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 58e6183..505ada2 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -72,7 +72,8 @@ static int smp_execute_task(struct domain_device *dev, void *req, int req_size,
         struct sas_internal *i =
                 to_sas_internal(dev->port->ha->core.shost->transportt);

-       mutex_lock(&dev->ex_dev.cmd_mutex);
+       if ((dev->dev_type == EDGE_DEV) || (FANOUT_DEV == dev->dev_type))
+               mutex_lock(&dev->ex_dev.cmd_mutex);
         for (retry = 0; retry < 3; retry++) {
                 if (test_bit(SAS_DEV_GONE, &dev->state)) {
                         res = -ECOMM;
@@ -144,7 +145,8 @@ static int smp_execute_task(struct domain_device *dev, void *req, int req_size,
                         task = NULL;
                 }
         }
-       mutex_unlock(&dev->ex_dev.cmd_mutex);
+       if ((dev->dev_type == EDGE_DEV) || (FANOUT_DEV == dev->dev_type))
+               mutex_unlock(&dev->ex_dev.cmd_mutex);

         BUG_ON(retry == 3 && task != NULL);
         sas_free_task(task);
--------------------------End Change in sas_expander.c-----------------------------------

--------------------------------------Begin Kernel log after change-------------------------------
[  132.667582] sas: phy-1:0 added to port-1:0, phy_mask:0x1 (5000c5001cb4c151) [  132.667595] drivers/scsi/mvsas/mv_sas.c 1225:set wide port phy map 1 [  132.687496] sas: DOING DISCOVERY on port 0, pid:5 [  132.687509] sas: DONE DISCOVERY on port 0, pid:5, result:0
[  133.001934] scsi 1:0:0:0: Direct-Access     HP       EH0146FAWJB      HPDD PQ: 0 ANSI: 5
[  133.010371] ssw_validate_device: Not a SATA device; skip
[  133.010638] sd 1:0:0:0: [sdb] Spinning up disk...
[  133.034290] sd 0:0:0:0: Attached scsi generic sg0 type 0
[  133.044469] sd 1:0:0:0: Attached scsi generic sg1 type 0
[  134.012041] .
[  134.668084] sas: broadcast received: 0
[  134.672977] sas: REVALIDATING DOMAIN on port 0, pid:5
[  134.672984] sas_ex_revalidate_domain: calling sas_find_bcast_dev 1
[  134.672991] sas_find_bcast_dev: calling sas_get_ex_change_count
[  134.673129] sas: smp_execute_task: task to dev 5000c5001cb4c151 response: 0x0 status 0x2
[  134.673258] sas: smp_execute_task: task to dev 5000c5001cb4c151 response: 0x0 status 0x2
[  134.673385] sas: smp_execute_task: task to dev 5000c5001cb4c151 response: 0x0 status 0x2
[  134.673394] sas_find_bcast_dev: done [  134.673400] sas_ex_revalidate_domain: done
[  134.673407] sas: done REVALIDATING DOMAIN on port 0, pid:5, res 0xffffffba
[  135.016018] ....ready [  138.470597] sd 1:0:0:0: [sdb] READ CAPACITY(16) failed
[  138.476212] sd 1:0:0:0: [sdb]  Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  138.488301] sd 1:0:0:0: [sdb] Sense not available.
[  138.495033] sd 1:0:0:0: [sdb] READ CAPACITY failed
[  138.501825] sd 1:0:0:0: [sdb]  Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  138.516376] sd 1:0:0:0: [sdb] Sense not available.
[  138.524333] sd 1:0:0:0: [sdb] Write Protect is off
[  138.532360] sd 1:0:0:0: [sdb] Mode Sense: 00 00 00 00
[  138.532393] sd 1:0:0:0: [sdb] Asking for cache data failed
[  138.540642] sd 1:0:0:0: [sdb] Assuming drive cache: write through
[  138.550085] sd 1:0:0:0: [sdb] READ CAPACITY(16) failed
[  138.558551] sd 1:0:0:0: [sdb]  Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  138.575243] sd 1:0:0:0: [sdb] Sense not available.
[  138.583697] sd 1:0:0:0: [sdb] READ CAPACITY failed
[  138.592104] sd 1:0:0:0: [sdb]  Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  138.609371] sd 1:0:0:0: [sdb] Sense not available.
[  138.618292] sd 1:0:0:0: [sdb] Asking for cache data failed
[  138.627172] sd 1:0:0:0: [sdb] Assuming drive cache: write through
[  138.636053] sd 1:0:0:0: [sdb] Attached SCSI disk

--------------------------------------End Kernel log after change-------------------------------

--------------------------------------Begin Kernel crash dump-------------------------------
[  366.946212] scsi 3:0:1:0: Direct-Access     HP       EH0146FAWJB      HPDD PQ: 0 ANSI: 5
[  366.946905] ssw_validate_device: Not a SATA device; skip
[  366.947225] sd 3:0:1:0: Attached scsi generic sg1 type 0
[  366.947370] sd 3:0:1:0: [sdb] Spinning up disk....
[  368.804046] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  368.804072] IP: [<ffffffff81358457>] __mutex_lock_common.isra.7+0x9c/0x15b
[  368.804098] PGD 0
[  368.804114] Oops: 0002 [#1] SMP
[  368.804143] CPU 1
[  368.804151] Modules linked in: sg netconsole s3g(PO) uinput joydev hid_multitouch usbhid hid snd_hda_codec_via cpufreq_userspace cpufreq_powersave cpufreq_stats uhci_hcd cpufreq_conservative snd_hda_intel snd_hda_codec snd_hwdep snd_pcm sdhci_pci snd_page_alloc sdhci snd_timer snd psmouse evdev serio_raw pcspkr soundcore xhci_hcd shpchp s3g_drm(O) mvsas mmc_core ahci libahci drm i2c_core acpi_cpufreq mperf video processor button thermal_sys dm_dmirror exfat_fs exfat_core dm_zcache dm_mod padlock_aes aes_generic padlock_sha iscsi_target_mod target_core_mod configfs sswipe libsas libata scsi_transport_sas picdev via_cputemp hwmon_vid fuse parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache jbd2 sd_mod crc_t10dif usb_storage scsi_mod ehci_hcd usbcore usb_common
[  368.804749]
[  368.804764] Pid: 392, comm: kworker/u:3 Tainted: P        W  O 3.4.87-logicube-ng.22 #1 To be filled by O.E.M. To be filled by O.E.M./EPIA-M920
[  368.804802] RIP: 0010:[<ffffffff81358457>]  [<ffffffff81358457>] __mutex_lock_common.isra.7+0x9c/0x15b
[  368.804827] RSP: 0018:ffff880117001cc0  EFLAGS: 00010246
[  368.804842] RAX: 0000000000000000 RBX: ffff8801185030d0 RCX: ffff88008edcb420
[  368.804857] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff8801185030d4
[  368.804873] RBP: ffff8801181531c0 R08: 0000000000000020 R09: 00000000fffffffe
[  368.804885] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801185030d4
[  368.804899] R13: 0000000000000002 R14: ffff880117001fd8 R15: ffff8801185030d8
[  368.804916] FS:  0000000000000000(0000) GS:ffff88011fc80000(0000) knlGS:0000000000000000
[  368.804931] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  368.804946] CR2: 0000000000000000 CR3: 000000000160b000 CR4: 00000000000006e0
[  368.804962] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  368.804978] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  368.804995] Process kworker/u:3 (pid: 392, threadinfo ffff880117000000, task ffff8801181531c0)
[  368.805009] Stack:
[  368.805017]  ffff8801185030d8 0000000000000000 ffffffff8161ddf0 ffffffff81056f7c
[  368.805062]  000000000000b503 ffff8801185030d0 ffff880118503000 0000000000000000
[  368.805100]  ffff8801185030d0 ffff8801188b8000 ffff88008edcb420 ffffffff813583ac
[  368.805135] Call Trace:
[  368.805153]  [<ffffffff81056f7c>] ? up+0xb/0x33
[  368.805168]  [<ffffffff813583ac>] ? mutex_lock+0x16/0x25
[  368.805194]  [<ffffffffa018c414>] ? smp_execute_task+0x4e/0x222 [libsas]
[  368.805217]  [<ffffffffa018ce1c>] ? sas_find_bcast_dev+0x3c/0x15d [libsas]
[  368.805240]  [<ffffffffa018ce4f>] ? sas_find_bcast_dev+0x6f/0x15d [libsas]
[  368.805264]  [<ffffffffa018e989>] ? sas_ex_revalidate_domain+0x37/0x2ec [libsas]
[  368.805280]  [<ffffffff81355a2a>] ? printk+0x43/0x48
[  368.805296]  [<ffffffff81359a65>] ? _raw_spin_unlock_irqrestore+0xc/0xd
[  368.805318]  [<ffffffffa018b767>] ? sas_revalidate_domain+0x85/0xb6 [libsas]
[  368.805336]  [<ffffffff8104e5d9>] ? process_one_work+0x151/0x27c
[  368.805351]  [<ffffffff8104f6cd>] ? worker_thread+0xbb/0x152
[  368.805366]  [<ffffffff8104f612>] ? manage_workers.isra.29+0x163/0x163
[  368.805382]  [<ffffffff81052c4e>] ? kthread+0x79/0x81
[  368.805399]  [<ffffffff8135fea4>] ? kernel_thread_helper+0x4/0x10
[  368.805416]  [<ffffffff81052bd5>] ? kthread_flush_work_fn+0x9/0x9
[  368.805431]  [<ffffffff8135fea0>] ? gs_change+0x13/0x13
[  368.805442] Code: 83 7d 30 63 7e 04 f3 90 eb ab 4c 8d 63 04 4c 8d 7b 08 4c 89 e7 e8 fa 15 00 00 48 8b 43 10 4c 89 3c 24 48 89 63 10 48 89 44 24 08 <48> 89 20 83 c8 ff 48 89 6c 24 10 87 03 ff c8 74 35 4d 89 ee 41
[  368.805851] RIP  [<ffffffff81358457>] __mutex_lock_common.isra.7+0x9c/0x15b
[  368.805877]  RSP <ffff880117001cc0>
[  368.805886] CR2: 0000000000000000
[  368.805899] ---[ end trace b720682065d8f4cc ]---
--------------------------------------End Kernel crash dump-------------------------------
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux