requesting advice: oops when doing sg_dd IO and device is removed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Doug (and anyone else who might be interested), I'm looking for some
advice as to how best to fix a problem I hit in sg.

I was doing the following:

- start up several processes reading from an sg device with sg_dd
- every other iteration, kill the processes
- remove the device (echo 1 > /sys/bus/scsi/devices/1:0:1:0/delete)
- add the device (echo "0 1 0" > /sys/class/scsi_host/host1/scan)

This was on a 2.6.9 kernel, with a patch:

http://marc.theaimsgroup.com/?l=linux-scsi&m=112749860222533&w=2

That I've submitted to the LSML... patch was intended to fix a different
oops when doing the above test.

Here's what I saw, after the device was removed:

Debug: sleeping function called from invalid context at
include/linux/rwsem.h:6in_atomic():1[expected: 0], irqs_disabled():0
 [<c011fcd1>] __might_sleep+0x7d/0x88
 [<c021cb5e>] device_del+0x1e/0x90
 [<f8840ccc>] scsi_device_dev_release+0xdf/0x13a [scsi_mod]
 [<c021c8e9>] device_release+0x11/0x40
 [<c01bf03f>] kobject_cleanup+0x40/0x60
 [<c01bf05f>] kobject_release+0x0/0x8
 [<c01bf305>] kref_put+0x42/0x45
 [<f883e61b>] scsi_next_command+0xc/0x14 [scsi_mod]
 [<f883a0a8>] scsi_release_request+0x8/0x10 [scsi_mod]
 [<f883ab15>] scsi_finish_command+0xad/0xb1 [scsi_mod]
 [<f883aa3a>] scsi_softirq+0xb6/0xbe [scsi_mod]
 [<c0126424>] __do_softirq+0x4c/0xb1
 [<c010812f>] do_softirq+0x4f/0x56
 =======================
 [<c0107a44>] do_IRQ+0x1a2/0x1ae
 [<c02d1a8c>] common_interrupt+0x18/0x20
 [<c0104018>] default_idle+0x0/0x2c
 [<c0104041>] default_idle+0x29/0x2c
 [<c010409d>] cpu_idle+0x26/0x3b
 [<c0390786>] start_kernel+0x199/0x19d

(It looks like a command comes back, we drop the last reference on the
device, the device is freed... because we're in softirq, and device_del
may eventually sleep (in this kernel version, anyway), we get this debug
message. I don't think this should normally happen when a command comes
back, for reasons I'll discuss below)

Continuing on...

Badness in kref_get at lib/kref.c:32
 [<c01bf2bb>] kref_get+0x24/0x2c
 [<c01beffb>] kobject_get+0xf/0x13
 [<c021cb32>] get_device+0xe/0x14
 [<f883ef34>] scsi_request_fn+0x19/0x319 [scsi_mod]
 [<c022213e>] blk_run_queue+0x20/0x2f
 [<f883a0a8>] scsi_release_request+0x8/0x10 [scsi_mod]
 [<f883ab15>] scsi_finish_command+0xad/0xb1 [scsi_mod]
 [<f883aa3a>] scsi_softirq+0xb6/0xbe [scsi_mod]
 [<c0126424>] __do_softirq+0x4c/0xb1
 [<c010812f>] do_softirq+0x4f/0x56
 =======================
 [<c0107a44>] do_IRQ+0x1a2/0x1ae
 [<c02d1a8c>] common_interrupt+0x18/0x20
 [<c0104018>] default_idle+0x0/0x2c
 [<c0104041>] default_idle+0x29/0x2c
 [<c010409d>] cpu_idle+0x26/0x3b
 [<c0390786>] start_kernel+0x199/0x19d

(So, there were more commands on the device's queue... presumably for
sg, since I wasn't doing any other IO to this disk. We're trying to get
a reference on the device, but the ref count has already gone to 0 so
this "Badness" message results.)

Continuing on...

Unable to handle kernel NULL pointer dereference at virtual address
00000008
 printing eip:
c0229678
*pde = 00004001
Oops: 0000 [#1]
SMP
Modules linked in: sg(U) parport_pc lp parport autofs4 nfs lockd sunrpc
md5
 ip)
CPU:  0
EIP:    0060:[<c0229678>]    Tainted: PF     VLI
EFLAGS: 00010046   (2.6.9-20.ELsmp)
EIP is at cfq_next_request+0x7/0x35
eax: f7b6f5e0   ebx: 00000000   ecx: c03249bc   edx: f243a594
esi: f7b6f5e0   edi: f7f5b000   ebp: f7b6f5e0   esp: c03c7f80
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c03c7000 task=c031ea80)
Stack: f7b6f5e0 f7b6f5e0 c02209dc f7b6f5e0 f243a400 f7f5b000 f883ef51
f7b6f5e0
       00000246 f243a400 00000000 c022213e f7d62800 f3872800 f883a0a8
f7f5b000
       f883ab15 00002002 f3872800 c03c7fd4 f883aa3a c03c7fd4 c03c7fd4
00000003
Call Trace:
 [<c02209dc>] elv_next_request+0xbe/0xce
 [<f883ef51>] scsi_request_fn+0x36/0x319 [scsi_mod]
 [<c022213e>] blk_run_queue+0x20/0x2f
 [<f883a0a8>] scsi_release_request+0x8/0x10 [scsi_mod]
 [<f883ab15>] scsi_finish_command+0xad/0xb1 [scsi_mod]
 [<f883aa3a>] scsi_softirq+0xb6/0xbe [scsi_mod]
 [<c0126424>] __do_softirq+0x4c/0xb1
 [<c010812f>] do_softirq+0x4f/0x56
 =======================
 [<c0107a44>] do_IRQ+0x1a2/0x1ae
 [<c02d1a8c>] common_interrupt+0x18/0x20
 [<c0104018>] default_idle+0x0/0x2c
 [<c0104041>] default_idle+0x29/0x2c
 [<c010409d>] cpu_idle+0x26/0x3b
 [<c0390786>] start_kernel+0x199/0x19d
Code: 01 00 00 00 89 e8 eb b6 8b 04 24 3b 43 24 0f 92 c0 31 d2 85 ff 0f
95 c2

(It was inevitable that something bad would happen when trying to run
the queue on a device which had been freed)


My theory here is that sg should be holding a reference on the SCSI
device until all IO comes back. With or without the patch I mentioned
above, this is not the case. Although sg_release does take out an extra
reference if there's any IO still outstanding, if sg_remove is called,
this reference is dropped. So after sg_remove, there could still be IO
to the device outstanding, but no reference on the device (from sg).
This means the device could get freed even though there's still some sg
IO on the queue.

Does this analysis sound reasonable? I'm a bit puzzled by the fact that
sd (for example) doesn't appear to have this problem (maybe sd_release
just never gets called until all IO comes back?).


So, how to fix this?

The first thing I tried was adding a kref to the Sg_device struct,
patterned after the reference counting in sd, sr, st. In sg_release, I
would drop a reference only if there was no IO outstanding. If there
_was_ IO outstanding, I would drop the reference in sg_cmd_done when the
last IO came back. In theory, this would ensure that sg kept a reference
on the device until all IO came back.

First problem: I was protecting the kref with a mutex, as sd, sr, etc.
do. Trying to down the mutex from sg_cmd_done didn't work (as this is
done from softirq). Okay, so I took out the mutex, just to see how
things would work. Still there was a problem... doing the final
scsi_device_put in sg_cmd_done won't work, because
scsi_device_dev_release calls device_del, which may sleep (in this
kernel version, anyway... I think this has changed in later kernels). So
it seems trying to drop a reference when the last command comes back is
out of the question.

The next thing I tried was simply sleeping in sg_release until all
outstanding IO comes back (changing sg_remove_sfp to return a count of
outstanding IOs), and only then dropping the device reference (patch
below). This seems to be working fine. But I'm wondering if this is
really a good solution? I'm not sure I can think of anything else... any
ideas?

Thanks!

Nate Dailey
Stratus Technologies


Here's the "wait in sg_release" patch (which I think is still useful,
even if it's not strictly required). This is against a 2.6.9 kernel, but
I can resubmit a patch against a later kernel if desired...

--- sg.c.orig   2005-10-13 16:50:00.806798387 -0400
+++ sg.c        2005-10-13 16:07:06.365733548 -0400
@@ -49,6 +49,7 @@ static int sg_version_num = 30531;    /* 2
 #include <linux/seq_file.h>
 #include <linux/blkdev.h>
 #include <linux/delay.h>
+#include <linux/kref.h>

 #include "scsi.h"
 #include <scsi/scsi_host.h>
@@ -173,6 +174,7 @@ typedef struct sg_device { /* holds the
        char sgdebug;           /* 0->off, 1->sense, 9->dump dev, 10->
all devs */
        struct gendisk *disk;
        struct cdev * cdev;     /* char_dev [sysfs:
/sys/cdev/major/sg<n>] */
+       struct kref kref;
 } Sg_device;

 static int sg_fasync(int fd, struct file *filp, int mode);
@@ -218,11 +220,61 @@ static Sg_device **sg_dev_arr = NULL;
 static int sg_dev_max;
 static int sg_nr_dev;

+/* This semaphore is used to mediate the 0->1 reference get in the
+ * face of object destruction (i.e. we can't allow a get on an
+ * object after last put) */
+static DECLARE_MUTEX(sg_ref_sem);
+
+static void scsi_generic_release(struct kref *kref);
+#define to_scsi_generic(obj) container_of(obj,Sg_device,kref)
+
 #define SZ_SG_HEADER sizeof(struct sg_header)
 #define SZ_SG_IO_HDR sizeof(sg_io_hdr_t)
 #define SZ_SG_IOVEC sizeof(sg_iovec_t)
 #define SZ_SG_REQ_INFO sizeof(sg_req_info_t)

+static Sg_device *scsi_generic_get(int dev)
+{
+       Sg_device *sdp = NULL;
+
+       down(&sg_ref_sem);
+
+       sdp = sg_get_dev (dev);
+       if (!sdp) goto out;
+
+       if (sdp->detached) {
+               sdp = NULL;
+               goto out;
+       }
+
+       kref_get(&sdp->kref);
+
+       if (!sdp->device)
+               goto out_put;
+
+       if (scsi_device_get(sdp->device))
+               goto out_put;
+
+       goto out;
+
+out_put:
+       kref_put(&sdp->kref, scsi_generic_release);
+       sdp = NULL;
+out:
+       up(&sg_ref_sem);
+       return sdp;
+}
+
+static void scsi_generic_put(Sg_device *sdp)
+{
+       struct scsi_device *sdev = sdp->device;
+
+       down(&sg_ref_sem);
+       kref_put(&sdp->kref, scsi_generic_release);
+       scsi_device_put(sdev);
+       up(&sg_ref_sem);
+}
+
 static int
 sg_open(struct inode *inode, struct file *filp)
 {
@@ -235,17 +287,8 @@ sg_open(struct inode *inode, struct file

        nonseekable_open(inode, filp);
        SCSI_LOG_TIMEOUT(3, printk("sg_open: dev=%d, flags=0x%x\n", dev,
flags));
-       sdp = sg_get_dev(dev);
-       if ((!sdp) || (!sdp->device))
+       if (!(sdp = scsi_generic_get(dev)))
                return -ENXIO;
-       if (sdp->detached)
-               return -ENODEV;
-
-       /* This driver's module count bumped by fops_get in <linux/fs.h>
*/
-       /* Prevent the device driver from vanishing while we sleep */
-       retval = scsi_device_get(sdp->device);
-       if (retval)
-               return retval;

        if (!((flags & O_NONBLOCK) ||
              scsi_block_when_processing_errors(sdp->device))) {
@@ -302,7 +345,7 @@ sg_open(struct inode *inode, struct file
        return 0;

       error_out:
-       scsi_device_put(sdp->device);
+       scsi_generic_put(sdp);
        return retval;
 }

@@ -317,13 +360,14 @@ sg_release(struct inode *inode, struct f
                return -ENXIO;
        SCSI_LOG_TIMEOUT(3, printk("sg_release: %s\n",
sdp->disk->disk_name));
        sg_fasync(-1, filp, 0); /* remove filp from async notification
list */
-       if (0 == sg_remove_sfp(sdp, sfp)) {     /* Returns 1 when sdp
gone */
-               if (!sdp->detached) {
-                       scsi_device_put(sdp->device);
-               }
-               sdp->exclude = 0;
-               wake_up_interruptible(&sdp->o_excl_wait);
-       }
+
+       /* Don't release the device until all IO comes back. */
+       while (sg_remove_sfp(sdp, sfp)){
+               msleep(10);
+       }
+       sdp->exclude = 0;
+       wake_up_interruptible(&sdp->o_excl_wait);
+       scsi_generic_put(sdp);
        return 0;
 }

@@ -1238,7 +1285,7 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
        sfp = srp->parentfp;
        if (sfp)
                sdp = sfp->parentdp;
-       if ((NULL == sdp) || sdp->detached) {
+       if (NULL == sdp) {
                printk(KERN_INFO "sg_cmd_done: device detached\n");
                scsi_release_request(SRpnt);
                return;
@@ -1292,18 +1339,8 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
        scsi_release_request(SRpnt);
        SRpnt = NULL;
-       if (sfp->closed) {      /* whoops this fd already released,
cleanup */
-               SCSI_LOG_TIMEOUT(1, printk("sg_cmd_done: already closed,
freeing ...\n"));
-               sg_finish_rem_req(srp);
-               srp = NULL;
-               if (NULL == sfp->headrp) {
-                       SCSI_LOG_TIMEOUT(1, printk("sg...bh: already
closed, final cleanup\n"));
-                       if (0 == sg_remove_sfp(sdp, sfp)) {     /*
device still present */
-                               scsi_device_put(sdp->device);
-                       }
-                       sfp = NULL;
-               }
-       } else if (srp && srp->orphan) {
+
+       if (srp && srp->orphan) {
                if (sfp->keep_orphan)
                        srp->sg_io_owned = 0;
                else {
@@ -1441,6 +1478,7 @@ sg_add(struct class_device *cl_dev)
        }
        k = error;
        sdp = sg_dev_arr[k];
+       kref_init(&sdp->kref);

        devfs_mk_cdev(MKDEV(SCSI_GENERIC_MAJOR, k),
                        S_IFCHR | S_IRUSR | S_IWUSR | S_IRGRP,
@@ -1493,45 +1531,33 @@ sg_remove(struct class_device *cl_dev)
        unsigned long iflags;
        Sg_fd *sfp;
        Sg_fd *tsfp;
-       Sg_request *srp;
-       Sg_request *tsrp;
-       int k, delay;
+       int k;

        if (NULL == sg_dev_arr)
                return;
-       delay = 0;
        write_lock_irqsave(&sg_dev_arr_lock, iflags);
        for (k = 0; k < sg_dev_max; k++) {
-               sdp = sg_dev_arr[k];
-               if ((NULL == sdp) || (sdp->device != scsidp))
+               if ((NULL == sg_dev_arr[k]) ||
+                   (sg_dev_arr[k]->device != scsidp))
                        continue;       /* dirty but lowers nesting */
+               sdp = sg_dev_arr[k];
+               sdp->detached = 1;
+
                if (sdp->headfp) {
-                       sdp->detached = 1;
                        for (sfp = sdp->headfp; sfp; sfp = tsfp) {
                                tsfp = sfp->nextfp;
-                               for (srp = sfp->headrp; srp; srp = tsrp)
{
-                                       tsrp = srp->nextrp;
-                                       if (sfp->closed || (0 ==
sg_srp_done(srp, sfp)))
-                                               sg_finish_rem_req(srp);
-                               }
-                               if (sfp->closed) {
-                                       scsi_device_put(sdp->device);
-                                       __sg_remove_sfp(sdp, sfp);
-                               } else {
-                                       delay = 1;
+                               if (!(sfp->closed)) {
 
wake_up_interruptible(&sfp->read_wait);
                                        kill_fasync(&sfp->async_qp,
SIGPOLL,
                                                    POLL_HUP);
                                }
                        }
                        SCSI_LOG_TIMEOUT(3, printk("sg_detach: dev=%d,
dirty\n", k));
-                       if (NULL == sdp->headfp) {
-                               sg_dev_arr[k] = NULL;
-                       }
                } else {        /* nothing active, simple case */
                        SCSI_LOG_TIMEOUT(3, printk("sg_detach:
dev=%d\n", k));
-                       sg_dev_arr[k] = NULL;
                }
+
+               sg_dev_arr[k] = NULL;
                sg_nr_dev--;
                break;
        }
@@ -1543,14 +1569,32 @@ sg_remove(struct class_device *cl_dev)
                cdev_del(sdp->cdev);
                sdp->cdev = NULL;
                devfs_remove("%s/generic", scsidp->devfs_name);
-               put_disk(sdp->disk);
-               sdp->disk = NULL;
-               if (NULL == sdp->headfp)
-                       kfree((char *) sdp);
+
+               down(&sg_ref_sem);
+               kref_put(&sdp->kref, scsi_generic_release);
+               up(&sg_ref_sem);
        }
+}
+
+/**
+ *      scsi_generic_release - Called to free the Sg_device structure
+ *      @kref: pointer to embedded kref
+ *
+ *      sg_ref_sem must be held entering this routine.  Because it is
+ *      called on last put, you should always use the
scsi_generic_get()
+ *      and scsi_generic_put() helpers which manipulate the semaphore
+ *      directly and never do a direct kref_put().
+ **/
+static void scsi_generic_release(struct kref *kref)
+{
+       Sg_device *sdp = to_scsi_generic(kref);
+       struct gendisk *disk = sdp->disk;
+
+       disk->private_data = NULL;
+       put_disk(disk);

-       if (delay)
-               msleep(10);     /* dirty detach so delay device
destruction */
+       kfree((char *) sdp);
+       return;
 }

 /* Set 'perm' (4th argument) to 0 to disable module_param's definition
@@ -2484,14 +2528,13 @@ __sg_remove_sfp(Sg_device * sdp, Sg_fd *
        sg_page_free((char *) sfp, sizeof (Sg_fd));
 }

-/* Returns 0 in normal case, 1 when detached and sdp object removed */
+/* Returns a count of IOs still in progress. */
 static int
 sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp)
 {
        Sg_request *srp;
        Sg_request *tsrp;
        int dirty = 0;
-       int res = 0;

        for (srp = sfp->headrp; srp; srp = tsrp) {
                tsrp = srp->nextrp;
@@ -2505,30 +2548,16 @@ sg_remove_sfp(Sg_device * sdp, Sg_fd * s

                write_lock_irqsave(&sg_dev_arr_lock, iflags);
                __sg_remove_sfp(sdp, sfp);
-               if (sdp->detached && (NULL == sdp->headfp)) {
-                       int k, maxd;
-
-                       maxd = sg_dev_max;
-                       for (k = 0; k < maxd; ++k) {
-                               if (sdp == sg_dev_arr[k])
-                                       break;
-                       }
-                       if (k < maxd)
-                               sg_dev_arr[k] = NULL;
-                       kfree((char *) sdp);
-                       res = 1;
-               }
                write_unlock_irqrestore(&sg_dev_arr_lock, iflags);
        } else {
-               /* MOD_INC's to inhibit unloading sg and associated
adapter driver */
-               /* only bump the access_count if we actually succeeded
in
-                * throwing another counter on the host module */
-               scsi_device_get(sdp->device);   /* XXX: retval ignored?
*/
+               /* The non-zero dirty count will be returned. This will
tell
+                  the caller not to drop the reference on this device.
*/
+
                sfp->closed = 1;        /* flag dirty state on this fd
*/
                SCSI_LOG_TIMEOUT(1, printk("sg_remove_sfp: worrisome, %d
writes pending\n",
                                  dirty));
        }
-       return res;
+       return dirty;
 }

 static int
-
: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux