Re: requesting advice: oops when doing sg_dd IO and device is removed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nate,
I always suspected that if one really tried hard enough
then unexpectedly closing a file descriptor (e.g. via
control-C on an app) at the same time as there are
outstanding commands could cause problems. Obviously
you have succeeded demonstrating that with your experiment.

When I fought with this problem in the lk 2.4 series,
someone skilled at finding flaws in locking logic
suggested there was no solution (at least the way
I was doing it) ...

Waiting a close() call (i.e. sg_release() ), potentially
indefinitely, was very unpopular with application clients.
The sg driver did that in the old days, and I was abused
from several quarters (one name springs to mind). So I
tried to design sg_release() so it wouldn't ever hang
the application client.

On reflection, I think to allow sg_release() to go
through quickly, another kernel thread is needed that
is passed ownership of unfinished commands. What do
you think?


Doug Gilbert


Dailey, Nate wrote:
> Doug (and anyone else who might be interested), I'm looking for some
> advice as to how best to fix a problem I hit in sg.
> 
> I was doing the following:
> 
> - start up several processes reading from an sg device with sg_dd
> - every other iteration, kill the processes
> - remove the device (echo 1 > /sys/bus/scsi/devices/1:0:1:0/delete)
> - add the device (echo "0 1 0" > /sys/class/scsi_host/host1/scan)
> 
> This was on a 2.6.9 kernel, with a patch:
> 
> http://marc.theaimsgroup.com/?l=linux-scsi&m=112749860222533&w=2
> 
> That I've submitted to the LSML... patch was intended to fix a different
> oops when doing the above test.
> 
> Here's what I saw, after the device was removed:
> 
> Debug: sleeping function called from invalid context at
> include/linux/rwsem.h:6in_atomic():1[expected: 0], irqs_disabled():0
>  [<c011fcd1>] __might_sleep+0x7d/0x88
>  [<c021cb5e>] device_del+0x1e/0x90
>  [<f8840ccc>] scsi_device_dev_release+0xdf/0x13a [scsi_mod]
>  [<c021c8e9>] device_release+0x11/0x40
>  [<c01bf03f>] kobject_cleanup+0x40/0x60
>  [<c01bf05f>] kobject_release+0x0/0x8
>  [<c01bf305>] kref_put+0x42/0x45
>  [<f883e61b>] scsi_next_command+0xc/0x14 [scsi_mod]
>  [<f883a0a8>] scsi_release_request+0x8/0x10 [scsi_mod]
>  [<f883ab15>] scsi_finish_command+0xad/0xb1 [scsi_mod]
>  [<f883aa3a>] scsi_softirq+0xb6/0xbe [scsi_mod]
>  [<c0126424>] __do_softirq+0x4c/0xb1
>  [<c010812f>] do_softirq+0x4f/0x56
>  =======================
>  [<c0107a44>] do_IRQ+0x1a2/0x1ae
>  [<c02d1a8c>] common_interrupt+0x18/0x20
>  [<c0104018>] default_idle+0x0/0x2c
>  [<c0104041>] default_idle+0x29/0x2c
>  [<c010409d>] cpu_idle+0x26/0x3b
>  [<c0390786>] start_kernel+0x199/0x19d
> 
> (It looks like a command comes back, we drop the last reference on the
> device, the device is freed... because we're in softirq, and device_del
> may eventually sleep (in this kernel version, anyway), we get this debug
> message. I don't think this should normally happen when a command comes
> back, for reasons I'll discuss below)
> 
> Continuing on...
> 
> Badness in kref_get at lib/kref.c:32
>  [<c01bf2bb>] kref_get+0x24/0x2c
>  [<c01beffb>] kobject_get+0xf/0x13
>  [<c021cb32>] get_device+0xe/0x14
>  [<f883ef34>] scsi_request_fn+0x19/0x319 [scsi_mod]
>  [<c022213e>] blk_run_queue+0x20/0x2f
>  [<f883a0a8>] scsi_release_request+0x8/0x10 [scsi_mod]
>  [<f883ab15>] scsi_finish_command+0xad/0xb1 [scsi_mod]
>  [<f883aa3a>] scsi_softirq+0xb6/0xbe [scsi_mod]
>  [<c0126424>] __do_softirq+0x4c/0xb1
>  [<c010812f>] do_softirq+0x4f/0x56
>  =======================
>  [<c0107a44>] do_IRQ+0x1a2/0x1ae
>  [<c02d1a8c>] common_interrupt+0x18/0x20
>  [<c0104018>] default_idle+0x0/0x2c
>  [<c0104041>] default_idle+0x29/0x2c
>  [<c010409d>] cpu_idle+0x26/0x3b
>  [<c0390786>] start_kernel+0x199/0x19d
> 
> (So, there were more commands on the device's queue... presumably for
> sg, since I wasn't doing any other IO to this disk. We're trying to get
> a reference on the device, but the ref count has already gone to 0 so
> this "Badness" message results.)
> 
> Continuing on...
> 
> Unable to handle kernel NULL pointer dereference at virtual address
> 00000008
>  printing eip:
> c0229678
> *pde = 00004001
> Oops: 0000 [#1]
> SMP
> Modules linked in: sg(U) parport_pc lp parport autofs4 nfs lockd sunrpc
> md5
>  ip)
> CPU:  0
> EIP:    0060:[<c0229678>]    Tainted: PF     VLI
> EFLAGS: 00010046   (2.6.9-20.ELsmp)
> EIP is at cfq_next_request+0x7/0x35
> eax: f7b6f5e0   ebx: 00000000   ecx: c03249bc   edx: f243a594
> esi: f7b6f5e0   edi: f7f5b000   ebp: f7b6f5e0   esp: c03c7f80
> ds: 007b   es: 007b   ss: 0068
> Process swapper (pid: 0, threadinfo=c03c7000 task=c031ea80)
> Stack: f7b6f5e0 f7b6f5e0 c02209dc f7b6f5e0 f243a400 f7f5b000 f883ef51
> f7b6f5e0
>        00000246 f243a400 00000000 c022213e f7d62800 f3872800 f883a0a8
> f7f5b000
>        f883ab15 00002002 f3872800 c03c7fd4 f883aa3a c03c7fd4 c03c7fd4
> 00000003
> Call Trace:
>  [<c02209dc>] elv_next_request+0xbe/0xce
>  [<f883ef51>] scsi_request_fn+0x36/0x319 [scsi_mod]
>  [<c022213e>] blk_run_queue+0x20/0x2f
>  [<f883a0a8>] scsi_release_request+0x8/0x10 [scsi_mod]
>  [<f883ab15>] scsi_finish_command+0xad/0xb1 [scsi_mod]
>  [<f883aa3a>] scsi_softirq+0xb6/0xbe [scsi_mod]
>  [<c0126424>] __do_softirq+0x4c/0xb1
>  [<c010812f>] do_softirq+0x4f/0x56
>  =======================
>  [<c0107a44>] do_IRQ+0x1a2/0x1ae
>  [<c02d1a8c>] common_interrupt+0x18/0x20
>  [<c0104018>] default_idle+0x0/0x2c
>  [<c0104041>] default_idle+0x29/0x2c
>  [<c010409d>] cpu_idle+0x26/0x3b
>  [<c0390786>] start_kernel+0x199/0x19d
> Code: 01 00 00 00 89 e8 eb b6 8b 04 24 3b 43 24 0f 92 c0 31 d2 85 ff 0f
> 95 c2
> 
> (It was inevitable that something bad would happen when trying to run
> the queue on a device which had been freed)
> 
> 
> My theory here is that sg should be holding a reference on the SCSI
> device until all IO comes back. With or without the patch I mentioned
> above, this is not the case. Although sg_release does take out an extra
> reference if there's any IO still outstanding, if sg_remove is called,
> this reference is dropped. So after sg_remove, there could still be IO
> to the device outstanding, but no reference on the device (from sg).
> This means the device could get freed even though there's still some sg
> IO on the queue.
> 
> Does this analysis sound reasonable? I'm a bit puzzled by the fact that
> sd (for example) doesn't appear to have this problem (maybe sd_release
> just never gets called until all IO comes back?).
> 
> 
> So, how to fix this?
> 
> The first thing I tried was adding a kref to the Sg_device struct,
> patterned after the reference counting in sd, sr, st. In sg_release, I
> would drop a reference only if there was no IO outstanding. If there
> _was_ IO outstanding, I would drop the reference in sg_cmd_done when the
> last IO came back. In theory, this would ensure that sg kept a reference
> on the device until all IO came back.
> 
> First problem: I was protecting the kref with a mutex, as sd, sr, etc.
> do. Trying to down the mutex from sg_cmd_done didn't work (as this is
> done from softirq). Okay, so I took out the mutex, just to see how
> things would work. Still there was a problem... doing the final
> scsi_device_put in sg_cmd_done won't work, because
> scsi_device_dev_release calls device_del, which may sleep (in this
> kernel version, anyway... I think this has changed in later kernels). So
> it seems trying to drop a reference when the last command comes back is
> out of the question.
> 
> The next thing I tried was simply sleeping in sg_release until all
> outstanding IO comes back (changing sg_remove_sfp to return a count of
> outstanding IOs), and only then dropping the device reference (patch
> below). This seems to be working fine. But I'm wondering if this is
> really a good solution? I'm not sure I can think of anything else... any
> ideas?
> 
> Thanks!
> 
> Nate Dailey
> Stratus Technologies
> 
> 
> Here's the "wait in sg_release" patch (which I think is still useful,
> even if it's not strictly required). This is against a 2.6.9 kernel, but
> I can resubmit a patch against a later kernel if desired...
> 
> --- sg.c.orig   2005-10-13 16:50:00.806798387 -0400
> +++ sg.c        2005-10-13 16:07:06.365733548 -0400
> @@ -49,6 +49,7 @@ static int sg_version_num = 30531;    /* 2
>  #include <linux/seq_file.h>
>  #include <linux/blkdev.h>
>  #include <linux/delay.h>
> +#include <linux/kref.h>
> 
>  #include "scsi.h"
>  #include <scsi/scsi_host.h>
> @@ -173,6 +174,7 @@ typedef struct sg_device { /* holds the
>         char sgdebug;           /* 0->off, 1->sense, 9->dump dev, 10->
> all devs */
>         struct gendisk *disk;
>         struct cdev * cdev;     /* char_dev [sysfs:
> /sys/cdev/major/sg<n>] */
> +       struct kref kref;
>  } Sg_device;
> 
>  static int sg_fasync(int fd, struct file *filp, int mode);
> @@ -218,11 +220,61 @@ static Sg_device **sg_dev_arr = NULL;
>  static int sg_dev_max;
>  static int sg_nr_dev;
> 
> +/* This semaphore is used to mediate the 0->1 reference get in the
> + * face of object destruction (i.e. we can't allow a get on an
> + * object after last put) */
> +static DECLARE_MUTEX(sg_ref_sem);
> +
> +static void scsi_generic_release(struct kref *kref);
> +#define to_scsi_generic(obj) container_of(obj,Sg_device,kref)
> +
>  #define SZ_SG_HEADER sizeof(struct sg_header)
>  #define SZ_SG_IO_HDR sizeof(sg_io_hdr_t)
>  #define SZ_SG_IOVEC sizeof(sg_iovec_t)
>  #define SZ_SG_REQ_INFO sizeof(sg_req_info_t)
> 
> +static Sg_device *scsi_generic_get(int dev)
> +{
> +       Sg_device *sdp = NULL;
> +
> +       down(&sg_ref_sem);
> +
> +       sdp = sg_get_dev (dev);
> +       if (!sdp) goto out;
> +
> +       if (sdp->detached) {
> +               sdp = NULL;
> +               goto out;
> +       }
> +
> +       kref_get(&sdp->kref);
> +
> +       if (!sdp->device)
> +               goto out_put;
> +
> +       if (scsi_device_get(sdp->device))
> +               goto out_put;
> +
> +       goto out;
> +
> +out_put:
> +       kref_put(&sdp->kref, scsi_generic_release);
> +       sdp = NULL;
> +out:
> +       up(&sg_ref_sem);
> +       return sdp;
> +}
> +
> +static void scsi_generic_put(Sg_device *sdp)
> +{
> +       struct scsi_device *sdev = sdp->device;
> +
> +       down(&sg_ref_sem);
> +       kref_put(&sdp->kref, scsi_generic_release);
> +       scsi_device_put(sdev);
> +       up(&sg_ref_sem);
> +}
> +
>  static int
>  sg_open(struct inode *inode, struct file *filp)
>  {
> @@ -235,17 +287,8 @@ sg_open(struct inode *inode, struct file
> 
>         nonseekable_open(inode, filp);
>         SCSI_LOG_TIMEOUT(3, printk("sg_open: dev=%d, flags=0x%x\n", dev,
> flags));
> -       sdp = sg_get_dev(dev);
> -       if ((!sdp) || (!sdp->device))
> +       if (!(sdp = scsi_generic_get(dev)))
>                 return -ENXIO;
> -       if (sdp->detached)
> -               return -ENODEV;
> -
> -       /* This driver's module count bumped by fops_get in <linux/fs.h>
> */
> -       /* Prevent the device driver from vanishing while we sleep */
> -       retval = scsi_device_get(sdp->device);
> -       if (retval)
> -               return retval;
> 
>         if (!((flags & O_NONBLOCK) ||
>               scsi_block_when_processing_errors(sdp->device))) {
> @@ -302,7 +345,7 @@ sg_open(struct inode *inode, struct file
>         return 0;
> 
>        error_out:
> -       scsi_device_put(sdp->device);
> +       scsi_generic_put(sdp);
>         return retval;
>  }
> 
> @@ -317,13 +360,14 @@ sg_release(struct inode *inode, struct f
>                 return -ENXIO;
>         SCSI_LOG_TIMEOUT(3, printk("sg_release: %s\n",
> sdp->disk->disk_name));
>         sg_fasync(-1, filp, 0); /* remove filp from async notification
> list */
> -       if (0 == sg_remove_sfp(sdp, sfp)) {     /* Returns 1 when sdp
> gone */
> -               if (!sdp->detached) {
> -                       scsi_device_put(sdp->device);
> -               }
> -               sdp->exclude = 0;
> -               wake_up_interruptible(&sdp->o_excl_wait);
> -       }
> +
> +       /* Don't release the device until all IO comes back. */
> +       while (sg_remove_sfp(sdp, sfp)){
> +               msleep(10);
> +       }
> +       sdp->exclude = 0;
> +       wake_up_interruptible(&sdp->o_excl_wait);
> +       scsi_generic_put(sdp);
>         return 0;
>  }
> 
> @@ -1238,7 +1285,7 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
>         sfp = srp->parentfp;
>         if (sfp)
>                 sdp = sfp->parentdp;
> -       if ((NULL == sdp) || sdp->detached) {
> +       if (NULL == sdp) {
>                 printk(KERN_INFO "sg_cmd_done: device detached\n");
>                 scsi_release_request(SRpnt);
>                 return;
> @@ -1292,18 +1339,8 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
>         scsi_release_request(SRpnt);
>         SRpnt = NULL;
> -       if (sfp->closed) {      /* whoops this fd already released,
> cleanup */
> -               SCSI_LOG_TIMEOUT(1, printk("sg_cmd_done: already closed,
> freeing ...\n"));
> -               sg_finish_rem_req(srp);
> -               srp = NULL;
> -               if (NULL == sfp->headrp) {
> -                       SCSI_LOG_TIMEOUT(1, printk("sg...bh: already
> closed, final cleanup\n"));
> -                       if (0 == sg_remove_sfp(sdp, sfp)) {     /*
> device still present */
> -                               scsi_device_put(sdp->device);
> -                       }
> -                       sfp = NULL;
> -               }
> -       } else if (srp && srp->orphan) {
> +
> +       if (srp && srp->orphan) {
>                 if (sfp->keep_orphan)
>                         srp->sg_io_owned = 0;
>                 else {
> @@ -1441,6 +1478,7 @@ sg_add(struct class_device *cl_dev)
>         }
>         k = error;
>         sdp = sg_dev_arr[k];
> +       kref_init(&sdp->kref);
> 
>         devfs_mk_cdev(MKDEV(SCSI_GENERIC_MAJOR, k),
>                         S_IFCHR | S_IRUSR | S_IWUSR | S_IRGRP,
> @@ -1493,45 +1531,33 @@ sg_remove(struct class_device *cl_dev)
>         unsigned long iflags;
>         Sg_fd *sfp;
>         Sg_fd *tsfp;
> -       Sg_request *srp;
> -       Sg_request *tsrp;
> -       int k, delay;
> +       int k;
> 
>         if (NULL == sg_dev_arr)
>                 return;
> -       delay = 0;
>         write_lock_irqsave(&sg_dev_arr_lock, iflags);
>         for (k = 0; k < sg_dev_max; k++) {
> -               sdp = sg_dev_arr[k];
> -               if ((NULL == sdp) || (sdp->device != scsidp))
> +               if ((NULL == sg_dev_arr[k]) ||
> +                   (sg_dev_arr[k]->device != scsidp))
>                         continue;       /* dirty but lowers nesting */
> +               sdp = sg_dev_arr[k];
> +               sdp->detached = 1;
> +
>                 if (sdp->headfp) {
> -                       sdp->detached = 1;
>                         for (sfp = sdp->headfp; sfp; sfp = tsfp) {
>                                 tsfp = sfp->nextfp;
> -                               for (srp = sfp->headrp; srp; srp = tsrp)
> {
> -                                       tsrp = srp->nextrp;
> -                                       if (sfp->closed || (0 ==
> sg_srp_done(srp, sfp)))
> -                                               sg_finish_rem_req(srp);
> -                               }
> -                               if (sfp->closed) {
> -                                       scsi_device_put(sdp->device);
> -                                       __sg_remove_sfp(sdp, sfp);
> -                               } else {
> -                                       delay = 1;
> +                               if (!(sfp->closed)) {
>  
> wake_up_interruptible(&sfp->read_wait);
>                                         kill_fasync(&sfp->async_qp,
> SIGPOLL,
>                                                     POLL_HUP);
>                                 }
>                         }
>                         SCSI_LOG_TIMEOUT(3, printk("sg_detach: dev=%d,
> dirty\n", k));
> -                       if (NULL == sdp->headfp) {
> -                               sg_dev_arr[k] = NULL;
> -                       }
>                 } else {        /* nothing active, simple case */
>                         SCSI_LOG_TIMEOUT(3, printk("sg_detach:
> dev=%d\n", k));
> -                       sg_dev_arr[k] = NULL;
>                 }
> +
> +               sg_dev_arr[k] = NULL;
>                 sg_nr_dev--;
>                 break;
>         }
> @@ -1543,14 +1569,32 @@ sg_remove(struct class_device *cl_dev)
>                 cdev_del(sdp->cdev);
>                 sdp->cdev = NULL;
>                 devfs_remove("%s/generic", scsidp->devfs_name);
> -               put_disk(sdp->disk);
> -               sdp->disk = NULL;
> -               if (NULL == sdp->headfp)
> -                       kfree((char *) sdp);
> +
> +               down(&sg_ref_sem);
> +               kref_put(&sdp->kref, scsi_generic_release);
> +               up(&sg_ref_sem);
>         }
> +}
> +
> +/**
> + *      scsi_generic_release - Called to free the Sg_device structure
> + *      @kref: pointer to embedded kref
> + *
> + *      sg_ref_sem must be held entering this routine.  Because it is
> + *      called on last put, you should always use the
> scsi_generic_get()
> + *      and scsi_generic_put() helpers which manipulate the semaphore
> + *      directly and never do a direct kref_put().
> + **/
> +static void scsi_generic_release(struct kref *kref)
> +{
> +       Sg_device *sdp = to_scsi_generic(kref);
> +       struct gendisk *disk = sdp->disk;
> +
> +       disk->private_data = NULL;
> +       put_disk(disk);
> 
> -       if (delay)
> -               msleep(10);     /* dirty detach so delay device
> destruction */
> +       kfree((char *) sdp);
> +       return;
>  }
> 
>  /* Set 'perm' (4th argument) to 0 to disable module_param's definition
> @@ -2484,14 +2528,13 @@ __sg_remove_sfp(Sg_device * sdp, Sg_fd *
>         sg_page_free((char *) sfp, sizeof (Sg_fd));
>  }
> 
> -/* Returns 0 in normal case, 1 when detached and sdp object removed */
> +/* Returns a count of IOs still in progress. */
>  static int
>  sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp)
>  {
>         Sg_request *srp;
>         Sg_request *tsrp;
>         int dirty = 0;
> -       int res = 0;
> 
>         for (srp = sfp->headrp; srp; srp = tsrp) {
>                 tsrp = srp->nextrp;
> @@ -2505,30 +2548,16 @@ sg_remove_sfp(Sg_device * sdp, Sg_fd * s
> 
>                 write_lock_irqsave(&sg_dev_arr_lock, iflags);
>                 __sg_remove_sfp(sdp, sfp);
> -               if (sdp->detached && (NULL == sdp->headfp)) {
> -                       int k, maxd;
> -
> -                       maxd = sg_dev_max;
> -                       for (k = 0; k < maxd; ++k) {
> -                               if (sdp == sg_dev_arr[k])
> -                                       break;
> -                       }
> -                       if (k < maxd)
> -                               sg_dev_arr[k] = NULL;
> -                       kfree((char *) sdp);
> -                       res = 1;
> -               }
>                 write_unlock_irqrestore(&sg_dev_arr_lock, iflags);
>         } else {
> -               /* MOD_INC's to inhibit unloading sg and associated
> adapter driver */
> -               /* only bump the access_count if we actually succeeded
> in
> -                * throwing another counter on the host module */
> -               scsi_device_get(sdp->device);   /* XXX: retval ignored?
> */
> +               /* The non-zero dirty count will be returned. This will
> tell
> +                  the caller not to drop the reference on this device.
> */
> +
>                 sfp->closed = 1;        /* flag dirty state on this fd
> */
>                 SCSI_LOG_TIMEOUT(1, printk("sg_remove_sfp: worrisome, %d
> writes pending\n",
>                                   dirty));
>         }
> -       return res;
> +       return dirty;
>  }
> 
>  static int
> -
> : send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-
: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux