RE: requesting advice: oops when doing sg_dd IO and device is removed

"Dailey, Nate" <Nate.Dailey@xxxxxxxxxxx> · Thu, 20 Oct 2005 12:48:21 -0400

Doug, I'm attaching a new patch (again, against 2.6.9)... the primary
difference from my previous patch is that this one now schedules a work
item from sg_cmd_done. This work item drops the reference on the device,
so this gets rid of the problems of calling scsi_device_put from
sg_cmd_done or of sleeping in sg_release. Does this look okay?

Nate

> -----Original Message-----
> From: Douglas Gilbert [mailto:dougg@xxxxxxxxxx] 
> Sent: Tuesday, October 18, 2005 12:55 AM
> To: Dailey, Nate
> Cc: linux-scsi@xxxxxxxxxxxxxxx
> Subject: Re: requesting advice: oops when doing sg_dd IO and 
> device is removed
> 
> 
> Nate,
> I always suspected that if one really tried hard enough
> then unexpectedly closing a file descriptor (e.g. via
> control-C on an app) at the same time as there are
> outstanding commands could cause problems. Obviously
> you have succeeded demonstrating that with your experiment.
> 
> When I fought with this problem in the lk 2.4 series,
> someone skilled at finding flaws in locking logic
> suggested there was no solution (at least the way
> I was doing it) ...
> 
> Waiting a close() call (i.e. sg_release() ), potentially
> indefinitely, was very unpopular with application clients.
> The sg driver did that in the old days, and I was abused
> from several quarters (one name springs to mind). So I
> tried to design sg_release() so it wouldn't ever hang
> the application client.
> 
> On reflection, I think to allow sg_release() to go
> through quickly, another kernel thread is needed that
> is passed ownership of unfinished commands. What do
> you think?
> 
> 
> Doug Gilbert
> 
> 
> Dailey, Nate wrote:
> > Doug (and anyone else who might be interested), I'm looking for some
> > advice as to how best to fix a problem I hit in sg.
> > 
> > I was doing the following:
> > 
> > - start up several processes reading from an sg device with sg_dd
> > - every other iteration, kill the processes
> > - remove the device (echo 1 > /sys/bus/scsi/devices/1:0:1:0/delete)
> > - add the device (echo "0 1 0" > /sys/class/scsi_host/host1/scan)
> > 
> > This was on a 2.6.9 kernel, with a patch:
> > 
> > http://marc.theaimsgroup.com/?l=linux-scsi&m=112749860222533&w=2
> > 
> > That I've submitted to the LSML... patch was intended to 
> fix a different
> > oops when doing the above test.
> > 
> > Here's what I saw, after the device was removed:
> > 
> > Debug: sleeping function called from invalid context at
> > include/linux/rwsem.h:6in_atomic():1[expected: 0], irqs_disabled():0
> >  [<c011fcd1>] __might_sleep+0x7d/0x88
> >  [<c021cb5e>] device_del+0x1e/0x90
> >  [<f8840ccc>] scsi_device_dev_release+0xdf/0x13a [scsi_mod]
> >  [<c021c8e9>] device_release+0x11/0x40
> >  [<c01bf03f>] kobject_cleanup+0x40/0x60
> >  [<c01bf05f>] kobject_release+0x0/0x8
> >  [<c01bf305>] kref_put+0x42/0x45
> >  [<f883e61b>] scsi_next_command+0xc/0x14 [scsi_mod]
> >  [<f883a0a8>] scsi_release_request+0x8/0x10 [scsi_mod]
> >  [<f883ab15>] scsi_finish_command+0xad/0xb1 [scsi_mod]
> >  [<f883aa3a>] scsi_softirq+0xb6/0xbe [scsi_mod]
> >  [<c0126424>] __do_softirq+0x4c/0xb1
> >  [<c010812f>] do_softirq+0x4f/0x56
> >  =======================
> >  [<c0107a44>] do_IRQ+0x1a2/0x1ae
> >  [<c02d1a8c>] common_interrupt+0x18/0x20
> >  [<c0104018>] default_idle+0x0/0x2c
> >  [<c0104041>] default_idle+0x29/0x2c
> >  [<c010409d>] cpu_idle+0x26/0x3b
> >  [<c0390786>] start_kernel+0x199/0x19d
> > 
> > (It looks like a command comes back, we drop the last 
> reference on the
> > device, the device is freed... because we're in softirq, 
> and device_del
> > may eventually sleep (in this kernel version, anyway), we 
> get this debug
> > message. I don't think this should normally happen when a 
> command comes
> > back, for reasons I'll discuss below)
> > 
> > Continuing on...
> > 
> > Badness in kref_get at lib/kref.c:32
> >  [<c01bf2bb>] kref_get+0x24/0x2c
> >  [<c01beffb>] kobject_get+0xf/0x13
> >  [<c021cb32>] get_device+0xe/0x14
> >  [<f883ef34>] scsi_request_fn+0x19/0x319 [scsi_mod]
> >  [<c022213e>] blk_run_queue+0x20/0x2f
> >  [<f883a0a8>] scsi_release_request+0x8/0x10 [scsi_mod]
> >  [<f883ab15>] scsi_finish_command+0xad/0xb1 [scsi_mod]
> >  [<f883aa3a>] scsi_softirq+0xb6/0xbe [scsi_mod]
> >  [<c0126424>] __do_softirq+0x4c/0xb1
> >  [<c010812f>] do_softirq+0x4f/0x56
> >  =======================
> >  [<c0107a44>] do_IRQ+0x1a2/0x1ae
> >  [<c02d1a8c>] common_interrupt+0x18/0x20
> >  [<c0104018>] default_idle+0x0/0x2c
> >  [<c0104041>] default_idle+0x29/0x2c
> >  [<c010409d>] cpu_idle+0x26/0x3b
> >  [<c0390786>] start_kernel+0x199/0x19d
> > 
> > (So, there were more commands on the device's queue... 
> presumably for
> > sg, since I wasn't doing any other IO to this disk. We're 
> trying to get
> > a reference on the device, but the ref count has already 
> gone to 0 so
> > this "Badness" message results.)
> > 
> > Continuing on...
> > 
> > Unable to handle kernel NULL pointer dereference at virtual address
> > 00000008
> >  printing eip:
> > c0229678
> > *pde = 00004001
> > Oops: 0000 [#1]
> > SMP
> > Modules linked in: sg(U) parport_pc lp parport autofs4 nfs 
> lockd sunrpc
> > md5
> >  ip)
> > CPU:  0
> > EIP:    0060:[<c0229678>]    Tainted: PF     VLI
> > EFLAGS: 00010046   (2.6.9-20.ELsmp)
> > EIP is at cfq_next_request+0x7/0x35
> > eax: f7b6f5e0   ebx: 00000000   ecx: c03249bc   edx: f243a594
> > esi: f7b6f5e0   edi: f7f5b000   ebp: f7b6f5e0   esp: c03c7f80
> > ds: 007b   es: 007b   ss: 0068
> > Process swapper (pid: 0, threadinfo=c03c7000 task=c031ea80)
> > Stack: f7b6f5e0 f7b6f5e0 c02209dc f7b6f5e0 f243a400 
> f7f5b000 f883ef51
> > f7b6f5e0
> >        00000246 f243a400 00000000 c022213e f7d62800 
> f3872800 f883a0a8
> > f7f5b000
> >        f883ab15 00002002 f3872800 c03c7fd4 f883aa3a 
> c03c7fd4 c03c7fd4
> > 00000003
> > Call Trace:
> >  [<c02209dc>] elv_next_request+0xbe/0xce
> >  [<f883ef51>] scsi_request_fn+0x36/0x319 [scsi_mod]
> >  [<c022213e>] blk_run_queue+0x20/0x2f
> >  [<f883a0a8>] scsi_release_request+0x8/0x10 [scsi_mod]
> >  [<f883ab15>] scsi_finish_command+0xad/0xb1 [scsi_mod]
> >  [<f883aa3a>] scsi_softirq+0xb6/0xbe [scsi_mod]
> >  [<c0126424>] __do_softirq+0x4c/0xb1
> >  [<c010812f>] do_softirq+0x4f/0x56
> >  =======================
> >  [<c0107a44>] do_IRQ+0x1a2/0x1ae
> >  [<c02d1a8c>] common_interrupt+0x18/0x20
> >  [<c0104018>] default_idle+0x0/0x2c
> >  [<c0104041>] default_idle+0x29/0x2c
> >  [<c010409d>] cpu_idle+0x26/0x3b
> >  [<c0390786>] start_kernel+0x199/0x19d
> > Code: 01 00 00 00 89 e8 eb b6 8b 04 24 3b 43 24 0f 92 c0 31 
> d2 85 ff 0f
> > 95 c2
> > 
> > (It was inevitable that something bad would happen when 
> trying to run
> > the queue on a device which had been freed)
> > 
> > 
> > My theory here is that sg should be holding a reference on the SCSI
> > device until all IO comes back. With or without the patch I 
> mentioned
> > above, this is not the case. Although sg_release does take 
> out an extra
> > reference if there's any IO still outstanding, if sg_remove 
> is called,
> > this reference is dropped. So after sg_remove, there could 
> still be IO
> > to the device outstanding, but no reference on the device (from sg).
> > This means the device could get freed even though there's 
> still some sg
> > IO on the queue.
> > 
> > Does this analysis sound reasonable? I'm a bit puzzled by 
> the fact that
> > sd (for example) doesn't appear to have this problem (maybe 
> sd_release
> > just never gets called until all IO comes back?).
> > 
> > 
> > So, how to fix this?
> > 
> > The first thing I tried was adding a kref to the Sg_device struct,
> > patterned after the reference counting in sd, sr, st. In 
> sg_release, I
> > would drop a reference only if there was no IO outstanding. If there
> > _was_ IO outstanding, I would drop the reference in 
> sg_cmd_done when the
> > last IO came back. In theory, this would ensure that sg 
> kept a reference
> > on the device until all IO came back.
> > 
> > First problem: I was protecting the kref with a mutex, as 
> sd, sr, etc.
> > do. Trying to down the mutex from sg_cmd_done didn't work 
> (as this is
> > done from softirq). Okay, so I took out the mutex, just to see how
> > things would work. Still there was a problem... doing the final
> > scsi_device_put in sg_cmd_done won't work, because
> > scsi_device_dev_release calls device_del, which may sleep (in this
> > kernel version, anyway... I think this has changed in later 
> kernels). So
> > it seems trying to drop a reference when the last command 
> comes back is
> > out of the question.
> > 
> > The next thing I tried was simply sleeping in sg_release until all
> > outstanding IO comes back (changing sg_remove_sfp to return 
> a count of
> > outstanding IOs), and only then dropping the device reference (patch
> > below). This seems to be working fine. But I'm wondering if this is
> > really a good solution? I'm not sure I can think of 
> anything else... any
> > ideas?
> > 
> > Thanks!
> > 
> > Nate Dailey
> > Stratus Technologies
Attachment:
sg.c.ND102005.patch

Description: sg.c.ND102005.patch