Doug, I'm attaching a new patch (again, against 2.6.9)... the primary difference from my previous patch is that this one now schedules a work item from sg_cmd_done. This work item drops the reference on the device, so this gets rid of the problems of calling scsi_device_put from sg_cmd_done or of sleeping in sg_release. Does this look okay? Nate > -----Original Message----- > From: Douglas Gilbert [mailto:dougg@xxxxxxxxxx] > Sent: Tuesday, October 18, 2005 12:55 AM > To: Dailey, Nate > Cc: linux-scsi@xxxxxxxxxxxxxxx > Subject: Re: requesting advice: oops when doing sg_dd IO and > device is removed > > > Nate, > I always suspected that if one really tried hard enough > then unexpectedly closing a file descriptor (e.g. via > control-C on an app) at the same time as there are > outstanding commands could cause problems. Obviously > you have succeeded demonstrating that with your experiment. > > When I fought with this problem in the lk 2.4 series, > someone skilled at finding flaws in locking logic > suggested there was no solution (at least the way > I was doing it) ... > > Waiting a close() call (i.e. sg_release() ), potentially > indefinitely, was very unpopular with application clients. > The sg driver did that in the old days, and I was abused > from several quarters (one name springs to mind). So I > tried to design sg_release() so it wouldn't ever hang > the application client. > > On reflection, I think to allow sg_release() to go > through quickly, another kernel thread is needed that > is passed ownership of unfinished commands. What do > you think? > > > Doug Gilbert > > > Dailey, Nate wrote: > > Doug (and anyone else who might be interested), I'm looking for some > > advice as to how best to fix a problem I hit in sg. > > > > I was doing the following: > > > > - start up several processes reading from an sg device with sg_dd > > - every other iteration, kill the processes > > - remove the device (echo 1 > /sys/bus/scsi/devices/1:0:1:0/delete) > > - add the device (echo "0 1 0" > /sys/class/scsi_host/host1/scan) > > > > This was on a 2.6.9 kernel, with a patch: > > > > http://marc.theaimsgroup.com/?l=linux-scsi&m=112749860222533&w=2 > > > > That I've submitted to the LSML... patch was intended to > fix a different > > oops when doing the above test. > > > > Here's what I saw, after the device was removed: > > > > Debug: sleeping function called from invalid context at > > include/linux/rwsem.h:6in_atomic():1[expected: 0], irqs_disabled():0 > > [<c011fcd1>] __might_sleep+0x7d/0x88 > > [<c021cb5e>] device_del+0x1e/0x90 > > [<f8840ccc>] scsi_device_dev_release+0xdf/0x13a [scsi_mod] > > [<c021c8e9>] device_release+0x11/0x40 > > [<c01bf03f>] kobject_cleanup+0x40/0x60 > > [<c01bf05f>] kobject_release+0x0/0x8 > > [<c01bf305>] kref_put+0x42/0x45 > > [<f883e61b>] scsi_next_command+0xc/0x14 [scsi_mod] > > [<f883a0a8>] scsi_release_request+0x8/0x10 [scsi_mod] > > [<f883ab15>] scsi_finish_command+0xad/0xb1 [scsi_mod] > > [<f883aa3a>] scsi_softirq+0xb6/0xbe [scsi_mod] > > [<c0126424>] __do_softirq+0x4c/0xb1 > > [<c010812f>] do_softirq+0x4f/0x56 > > ======================= > > [<c0107a44>] do_IRQ+0x1a2/0x1ae > > [<c02d1a8c>] common_interrupt+0x18/0x20 > > [<c0104018>] default_idle+0x0/0x2c > > [<c0104041>] default_idle+0x29/0x2c > > [<c010409d>] cpu_idle+0x26/0x3b > > [<c0390786>] start_kernel+0x199/0x19d > > > > (It looks like a command comes back, we drop the last > reference on the > > device, the device is freed... because we're in softirq, > and device_del > > may eventually sleep (in this kernel version, anyway), we > get this debug > > message. I don't think this should normally happen when a > command comes > > back, for reasons I'll discuss below) > > > > Continuing on... > > > > Badness in kref_get at lib/kref.c:32 > > [<c01bf2bb>] kref_get+0x24/0x2c > > [<c01beffb>] kobject_get+0xf/0x13 > > [<c021cb32>] get_device+0xe/0x14 > > [<f883ef34>] scsi_request_fn+0x19/0x319 [scsi_mod] > > [<c022213e>] blk_run_queue+0x20/0x2f > > [<f883a0a8>] scsi_release_request+0x8/0x10 [scsi_mod] > > [<f883ab15>] scsi_finish_command+0xad/0xb1 [scsi_mod] > > [<f883aa3a>] scsi_softirq+0xb6/0xbe [scsi_mod] > > [<c0126424>] __do_softirq+0x4c/0xb1 > > [<c010812f>] do_softirq+0x4f/0x56 > > ======================= > > [<c0107a44>] do_IRQ+0x1a2/0x1ae > > [<c02d1a8c>] common_interrupt+0x18/0x20 > > [<c0104018>] default_idle+0x0/0x2c > > [<c0104041>] default_idle+0x29/0x2c > > [<c010409d>] cpu_idle+0x26/0x3b > > [<c0390786>] start_kernel+0x199/0x19d > > > > (So, there were more commands on the device's queue... > presumably for > > sg, since I wasn't doing any other IO to this disk. We're > trying to get > > a reference on the device, but the ref count has already > gone to 0 so > > this "Badness" message results.) > > > > Continuing on... > > > > Unable to handle kernel NULL pointer dereference at virtual address > > 00000008 > > printing eip: > > c0229678 > > *pde = 00004001 > > Oops: 0000 [#1] > > SMP > > Modules linked in: sg(U) parport_pc lp parport autofs4 nfs > lockd sunrpc > > md5 > > ip) > > CPU: 0 > > EIP: 0060:[<c0229678>] Tainted: PF VLI > > EFLAGS: 00010046 (2.6.9-20.ELsmp) > > EIP is at cfq_next_request+0x7/0x35 > > eax: f7b6f5e0 ebx: 00000000 ecx: c03249bc edx: f243a594 > > esi: f7b6f5e0 edi: f7f5b000 ebp: f7b6f5e0 esp: c03c7f80 > > ds: 007b es: 007b ss: 0068 > > Process swapper (pid: 0, threadinfo=c03c7000 task=c031ea80) > > Stack: f7b6f5e0 f7b6f5e0 c02209dc f7b6f5e0 f243a400 > f7f5b000 f883ef51 > > f7b6f5e0 > > 00000246 f243a400 00000000 c022213e f7d62800 > f3872800 f883a0a8 > > f7f5b000 > > f883ab15 00002002 f3872800 c03c7fd4 f883aa3a > c03c7fd4 c03c7fd4 > > 00000003 > > Call Trace: > > [<c02209dc>] elv_next_request+0xbe/0xce > > [<f883ef51>] scsi_request_fn+0x36/0x319 [scsi_mod] > > [<c022213e>] blk_run_queue+0x20/0x2f > > [<f883a0a8>] scsi_release_request+0x8/0x10 [scsi_mod] > > [<f883ab15>] scsi_finish_command+0xad/0xb1 [scsi_mod] > > [<f883aa3a>] scsi_softirq+0xb6/0xbe [scsi_mod] > > [<c0126424>] __do_softirq+0x4c/0xb1 > > [<c010812f>] do_softirq+0x4f/0x56 > > ======================= > > [<c0107a44>] do_IRQ+0x1a2/0x1ae > > [<c02d1a8c>] common_interrupt+0x18/0x20 > > [<c0104018>] default_idle+0x0/0x2c > > [<c0104041>] default_idle+0x29/0x2c > > [<c010409d>] cpu_idle+0x26/0x3b > > [<c0390786>] start_kernel+0x199/0x19d > > Code: 01 00 00 00 89 e8 eb b6 8b 04 24 3b 43 24 0f 92 c0 31 > d2 85 ff 0f > > 95 c2 > > > > (It was inevitable that something bad would happen when > trying to run > > the queue on a device which had been freed) > > > > > > My theory here is that sg should be holding a reference on the SCSI > > device until all IO comes back. With or without the patch I > mentioned > > above, this is not the case. Although sg_release does take > out an extra > > reference if there's any IO still outstanding, if sg_remove > is called, > > this reference is dropped. So after sg_remove, there could > still be IO > > to the device outstanding, but no reference on the device (from sg). > > This means the device could get freed even though there's > still some sg > > IO on the queue. > > > > Does this analysis sound reasonable? I'm a bit puzzled by > the fact that > > sd (for example) doesn't appear to have this problem (maybe > sd_release > > just never gets called until all IO comes back?). > > > > > > So, how to fix this? > > > > The first thing I tried was adding a kref to the Sg_device struct, > > patterned after the reference counting in sd, sr, st. In > sg_release, I > > would drop a reference only if there was no IO outstanding. If there > > _was_ IO outstanding, I would drop the reference in > sg_cmd_done when the > > last IO came back. In theory, this would ensure that sg > kept a reference > > on the device until all IO came back. > > > > First problem: I was protecting the kref with a mutex, as > sd, sr, etc. > > do. Trying to down the mutex from sg_cmd_done didn't work > (as this is > > done from softirq). Okay, so I took out the mutex, just to see how > > things would work. Still there was a problem... doing the final > > scsi_device_put in sg_cmd_done won't work, because > > scsi_device_dev_release calls device_del, which may sleep (in this > > kernel version, anyway... I think this has changed in later > kernels). So > > it seems trying to drop a reference when the last command > comes back is > > out of the question. > > > > The next thing I tried was simply sleeping in sg_release until all > > outstanding IO comes back (changing sg_remove_sfp to return > a count of > > outstanding IOs), and only then dropping the device reference (patch > > below). This seems to be working fine. But I'm wondering if this is > > really a good solution? I'm not sure I can think of > anything else... any > > ideas? > > > > Thanks! > > > > Nate Dailey > > Stratus Technologies
Attachment:
sg.c.ND102005.patch
Description: sg.c.ND102005.patch