Re: [linux-usb-devel] oops on usb storage device disconnect with 2.6.14-rc1

Mike Anderson <andmike@xxxxxxxxxx> · Thu, 15 Sep 2005 16:55:28 -0700

James Bottomley <James.Bottomley@xxxxxxxxxxxx> wrote:
> On Thu, 2005-09-15 at 15:19 -0700, Mike Anderson wrote:
> > A side effect of not applying Alan's previous patch that added
> > SHOST_RECOVERY to the SHOST_CANCEL: state is that we will not move to the
> > SHOST_CANCEL and subsequently not to SHOST_DEL state if the eh is active
> > during the start of scsi_remove_host. I sent mail on the 7th indicating to
> > include that state change hunk of the diff, but I guess that overlapped
> > with your newer state changes.
> > http://marc.theaimsgroup.com/?l=linux-scsi&m=112238726326927&w=2
> 
> Yes, but that's not really legitimate since it introduces a bifurcation
> in the state machine ... when the eh terminates it will come back to
> running even if it went in from cancel.

Clarification here as I do not see the split in the state machine or the
transition back to running from cancel. If the above patch is applied we
can transition to cancel from recovery if eh already has started. When eh
is complete scsi_restart_operations transition to running will fail as we
are in the cancel state. 

That said I like the idea below of waiting / terminating the eh thread
prior to transitioning to cancel. There is some introduction of asymmetry
here in scsi_remove_host as the eh thread is created in scsi_host_alloc,
but possibly later patches could move the eh creation to scsi_add_host
(unless I forgot the reason it needed to be earlier).

> 
> > In looking at the state model introduced by your patch I believe there may
> > still be a state model race issue if the recovery completes just after
> > the "if (!scsi_host_set_state(shost, SHOST_CANCEL))" call in
> > scsi_remove_host (maybe I am just looking to quickly at the state
> > updates).
> 
> No, that's true; there's a tiny race that can be mediated by doing
> locking around the state changes ... that was one of the feedback
> comments from Alan.

ok.

> 
> > I still do not understand as I asked in a previous comment why we are not
> > shutting down the eh_thread in scsi_remove_host and also why simpler state
> > model updates could not solve the problem.
> 
> Well, it goes back to whether we wait for the thread or not.  To shut
> the thread down, we also need to wait for it to complete.
> 
> As far as the state model goes, we either need to wait for the eh thread
> before transitioning to cancel or introduce the extra states that
> reflect the parallel eh transitions.

I like waiting for the eh thread to complete / shutdown prior to
transitioning to cancel.

> 
> > I believe I also indicated that we could enhance scsi_error to shutdown
> > faster during this state which should only be a performance improvement.
> 
> Yes, we could ... patches?

ok, I will try, but IBM non-coding activities has made me very
unproductive in the patch department lately :-).

-andmike
--
Michael Anderson
andmike@xxxxxxxxxx
-
: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html